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Introduction 


The  use  of  discretized  integral  equations  for  engineering  analysis  of  electromagnetic  problems 
has  been  popular  at  least  since  the  appearance  of  Harrington’s  treatise[l]  nearly  thirty  years 
ago.  Algorithms  using  method  of  moments  [1]  or  weighted-residuals  [2,  3]  based  discretizations 
of  integral  equation  formulations  are  often  attractive  for  problems  with  complicated  problem 
domains  because  of  the  reduced  complexity  of  discretization  when  compared  with  competing 
approaches  such  as  finite-element  methods.  For  example,  boundary-element  methods[4]  for 
solving  elliptic  partial  differential  equations  require  discretization  only  of  domain  boundaries 
and  not  of  interior  or,  particularly,  exterior  volumes.  The  difficulty  with  such  approaches  is 
that  they  generate  dense  matrix  problems  which  are  computationally  expensive  to  solve,  and 
this  limits  the  complexity  of  problems  which  can  be  analyzed.  Recently,  the  combination  of 
iterative  linear  system  solution  algorithms  and  matrix  “sparsification”  techniques  has  been 
used  to  create  efficient  integral-equation  codes  [5,  6,  7]  and  has  resulted  in  renewed  interest  in 
integral-equation  methods. 

To  illustrate  the  difficulties  associated  with  discretized  integral  equations,  consider  the  model 
problem  of  capacitance  extraction.  The  capacitance  of  an  m-conductor  geometry  is  given  by  a 
(symmetric)  matrix  C  G  Rmxm.  The  entry  Cki  represents  capacitive  coupling  between  conduc¬ 
tors  Z  and  k .  In  a  general  sense,  the  capacitance  matrix  may  be  determined  from  the  matrix 
equation 

CV  =  Q  (1-1) 

where  the  columns  of  V  G  Rmxm  are  linearly  independent  “test”  vectors.  Vja  represents  the 
potential  of  the  kth  conductor,  for  the  Zth  test  vector.  Q  G  Rmxm  is  a  matrix  of  conductor 
charges,  that  is,  Q^i  is  the  charge  on  conductor  k  at  the  Zth  test.  For  example,  suppose 
the  potential  of  each  conductor  Z  is  set  to  unity  in  turn  and  the  potential  of  the  remaining 
conductors  set  to  zero  (this  corresponds  to  the  choice  of  the  Zth  column  of  V  to  be  the  Zth  unit 
vector).  The  resulting  conductor  charges  (the  Zth  column  of  Q)  will  give  the  Zth  column  of  the 
capacitance  matrix,  i.e.,  the  charge  Qki  on  conductor  k  is  the  capacitance  Cm-  Thus,  to  obtain 
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Figure  1-1:  Piecewise-constant  collocation  discretization  of  two  conductors.  Conductor  sur¬ 
faces  are  discretized  into  panels  which  support  a  constant  charge  density. 


the  capacitance  matrix,  it  is  necessary  to  solve  Laplace’s  equation  for  a  sequence  of  m  Dirichlet 
boundary  conditions.  If  the  conductors  are  embedded  in  an  infinite  homogeneous  dielectric, 
such  as  free  space,  a  first-kind  integral  equation  may  be  written[8,  9,  10]  for  the  charge  density 
a  which  lies  on  the  conductor  surfaces, 

^  =  L*ce/M  ’  <L2) 

where  tp(x)  is  the  known  conductor  surface  potential,  da'  is  the  differential  conductor  surface 
area,  x ,  x'  £  R3,  e  is  the  dielectric  constant,  and  ||a;||  is  the  Euclidean  length  of  x. 

A  standard  approach[l]  to  numerically  solving  (1.2)  for  the  charge  density  a  is  to  use  a  piece- 
wise  constant  collocation  scheme.  In  such  an  approach  the  conductor  surfaces  are  approximated 
by  a  set  of  n  polygons,  or  “panels”,  and  it  is  assumed  that  on  each  panel  i,  a  charge,  qi,  is 
uniformly  distributed,  as  in  Figure  1-1.  That  is,  the  charge  density  sigma  is  approximated  by 
an  expansion  in  basis  functions  9i , 


a(x)  &^2qidi(x),  (1.3) 

i=  1 

where  the  0*  are  given  by 


6i{x)  =  1  x  £  panel  i  (1.4) 

0i  (:/;)  =  0  otherwise. 


Then  for  each  panel,  an  equation  is  written  which  relates  the  known  potential  at  the  center  of 
that  i-th  panel,  denoted  and  given  at  the  1th  potential  solution  by  fi  =  Vu  if  panel  i  is  on 
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LU  factorization. 

conductor  k,  to  the  sum  of  the  contributions  to  that  potential  from  the  n  charge  distributions 
on  all  n  panels.  The  result  is  the  dense  linear  system, 


Pq  =  f  (1.5) 

where  P  E  R"x”,  q  E  Rn  is  the  vector  of  panel  charges,  /  €  Rn  is  the  vector  of  known  panel 
potentials,  and 

Plj  dj  Ipanelj  47re||rc*  -  x'\\  da  ’  (1-6) 

where  the  collocation  point  X{  is  the  center  of  the  z-th  panel  and  dj  is  the  area  of  the  jf-th  panel. 
For  each  column  l  of  V,  the  dense  linear  system  (  1.5)  is  solved  to  compute  the  panel  charges. 
The  Zth  column  of  Q  is  then  obtained  by  summing  the  panel  charges  on  each  conductor. 

The  direct  approach  of  solving  (1.5)  via  Gaussian  elimination,  which  requires  0(n 3)  opera¬ 
tions  and  0(n 2)  storage,  becomes  computationally  intractable  if  the  number  of  panels  exceeds 
several  hundred.  Figure  1-2  shows  that  the  resources  required  to  solve  even  a  moderately  large 
dense  matrix  problem  by  direct  methods  exceed  those  available  on  a  modern  (circa  1997)  sci¬ 
entific  workstation.  Continuing  advances  in  computer  hardware  make  tractable  larger  dense 
matrix  problems  than  were  accessible  just  a  few  years  ago,  but  the  0(n3)  and  0(n 2)  growth  in 
computational  resource  requirements  ensure  that  near-O(n)  methods  will  always  have  substan¬ 
tial  performance  advantages. 

To  avoid  the  formidable  computational  complexity  of  Gaussian  elimination,  an  iterative 
technique  such  as  GMRES  [11]  may  be  used  to  solve  (1.5).  Normally  each  iteration  of  GMRES 
will  cost  n 2  operations  because  the  matrix  in  (1.5)  is  dense,  and  therefore  evaluating  candidate 
solution  vectors  involves  a  dense  matrix-vector  multiply.  The  key  to  accelerating  the  iterative 
solution  process  is  the  insight  that  the  Pq  product  can  be  interpreted  as  a  potential  computation. 


Numerous  algorithms  have  been  proposed  for  the  rapid  evaluation  of  the  potential  of  a  set 
of  charges  at  all  the  other  charge  points.  Historically  the  first  algorithms  developed  for  this  “n- 
body  problem”  were  the  “particle-mesh”  methods  (see  [12]  for  extensive  references)  originally 
concieved  for  use  in  simulations  in  plasma  physics.  The  fast  multipole  method  (FMM)[5],  which 
has  enjoyed  much  recent  popularity,  and  multigrid  methods  [13],  which  are  probably  the  most 
versatile,  have  also  been  proposed.  Most  of  these  algorithms  work  by  separating  the  potential 
into  a  short-range  and  a  long-range  part  and  then  seeking  a  computationally  efficient  means  to 
approximate  the  long-range  part  of  the  potential.  The  various  algorithms  differ  in  the  way  the 
long-range  potential  is  approximated  and  in  the  way  local  interactions  are  treated. 

The  multipole-accelerated  iterative  approaches[7,  14,  5,  15],  now  commonly  used  in  a  variety 
of  engineering  applications  [16],  require  orders  of  magnitude  less  storage  and  computation  than 
algorithms  based  on  direct  matrix  representations,  but  are  still  computationally  expensive  and 
memory-exhausting.  More  importantly,  such  codes  have  heretofore  been  developed  only  for  1/r- 
type  kernels.  In  this  work  we  are  concerned  with  developing  algorithms  for  kernels  other  than 
the  1  /r  type  generated  by  free-space  Laplace  problems.  Of  primary  interest  is  developing  sparse 
representation  procedures  for  matrices  which  are  derived  from  potential  problems  with  relatively 
general  kernels,  at  least  including  1/r  and  for  a  wide  range  of  kr  [17,  18,  19,  20,  6,  21]. 

In  this  thesis,  we  describe  a  precorrected-FFT  approach  which  can  replace  the  fast  multipole 
algorithm  for  accelerating  the  potential  calculation  needed  to  perform  the  matrix- vector  product 
in  a  discretized  integral-equation  code.  The  central  idea  of  the  algorithm  is  to  represent  the 
long-range  part  of  the  potential  by  point  charges  lying  on  a  uniform  grid,  rather  than  by  series 
expansions  as  in  fast  multipole  algorithms[5].  This  grid  representation  allows  the  Fast  Fourier 
Transform  (FFT)[22,  23,  24]  to  be  used  to  efficiently  perform  potential  computations.  Because 
only  the  long-range  part  of  the  potential  is  represented  by  the  grid,  the  grid  is  not  coupled  to 
the  underlying  discretization  of  the  structure.  Decoupling  the  long  and  short  range  parts  of  the 
potentials  allows  the  algorithm  to  efficiently  solve  problems  which  may  be  discretized  in  a  very 
irregular  fashion. 

We  have  attempted  to  develop  an  algorithm  which,  like  particle-mesh  methods,  exploits  the 
availability  of  efficient  discrete  Fourier  transform  implementations  while  at  the  same  time  pre¬ 
serves  the  higher  accuracy  of  the  multipole-based  schemes,  but  is  also  (like  multigrid  schemes) 
easily  adapted  to  a  broad  class  of  kernels.  In  addition,  the  algorithm  is,  in  the  way  local  in¬ 
teractions  are  treated,  particularly  adapted  to  boundary-integral  solvers.  The  hierarchical  fast- 
multipole  algorithm  [5]  can  perform  the  dense  matrix- vector  product  associated  with  discretized 
1/r  potential  integral  equations  in  order-n  ( 0(n ))  time  and  memory.  The  precorrected-FFT 
method,  described  below,  is  at  best  an  O(nlogn)  algorithm.  It  is  possible  to  construct  ge¬ 
ometries  for  which  the  performance  of  the  precorrected-FFT  algorithm  is  inferior  to  the  fast 
multipole  methods,  but  we  demonstrate  that  for  many  realistic  structures  precorrected-FFT 
methods  are  faster  and  use  less  memory. 


This  performance  improvement  and  flexibility  comes  with  a  cost.  First,  the  precorrected- 
FFT  method  can  only  treat  integral  equations  with  a  fairly  restricted  class  of  kernels  -  but 
this  is  true  for  most  fast  integral  equation  solvers.  The  kernel  must  have  piece wise-smooth 
convolutional  form.  What  precisely  is  meant  by  piecewise-smooth  convolutional  form  will  be 
made  more  clear  in  Chapter  6,  and  for  now  it  is  sufficient  to  know  that  both  the  Laplace  and 
Helmholtz  kernel  and  their  close  relatives  are  of  this  form.  Second,  problems  which  are  very 
inhomogeneous,  that  is,  which  have  regions  of  densely  concentrated  field  sources  interspersed 
with  large  regions  of  empty  space,  or,  in  other  words,  vastly  varying  densitities  of  source  points, 
are  not  treated  well  by  the  algorithm.  This  is  perhaps  the  most  glaring  shortcoming  of  the 
algorithm.  Third,  the  way  in  which  the  algorithm  accounts  for  local  interactions  can  be  com¬ 
putationally  expensive  when  high  accuracy  is  desired.  It  is  possible  to  remedy  this  situation  at 
some  cost  in  accuracy. 

On  the  other  hand,  the  precorrected-FFT  method  does  not  suffer  from  some  salient  dis¬ 
advantages  of  alternative  schemes,  disadvantages  which  do  impact  their  practical  applicability. 
Wavelet-type[25]  approaches  are  not  applicable  for  oscillatory  kernels.  Fast-multipole  type 
algorithms  for  quasi-static  problems  do  not  extend  to  the  Helmholtz  case.  To  the  author’s 
knowledge,  available  fast  multipole  algorithms  for  Helmholtz  kernels  are  not  numerically  stable 
(or  have  0(n2)  complexity)  for  problem  domains  which  are  only  a  few  wavelengths  in  size.  The 
algorithms  proposed  in  [17,  18]  require  that  the  size  of  direct  interaction  regions  be  of  order  of  a 
wavelength,  otherwise  the  operators  that  diagonalize  the  multipole  translation  and  conversion 
operators  are  not  numerically  stable.  If  the  problem  domain  is  of  the  size  of  a  few  wavelengths, 
the  number  of  unknowns  contained  in  the  direct  interaction  regions  will  be  of  0(n)  making  the 
whole  algorithm  0(n2)  for  a  potential  computation.  Hybrid  schemes,  similar  to  the  approach 
proposed  in  [6]  in  the  context  of  multigrid  methods,  are  of  course  possible.  Many  important 
problems  of  engineering  interest,  such  as  problems  in  electromagnetic  compatibility  analysis 
and  package-level  interconnect  analysis,  fall  into  this  category. 

When  electromagnetic  analysis  moves  into  the  non-quasistatic  region,  an  additional  dimen¬ 
sion  is  introduced  into  the  analysis,  that  of  frequency-dependence  of  the  matrix  elements.  Often 
it  is  not  an  analysis  at  a  fixed  operating  frequency  that  is  required,  but  the  extraction  of  some 
frequency-dependent  response  function. 

The  simplest  method  of  obtaining  the  frequency  response  is  to  sample  the  complex  Laplace- 
domain  response  at  discrete  points  along  the  imaginary  axis.  The  frequency  range  of  interest 
may  span  several  decades,  however,  and  the  magnitude  of  the  response  may  also  vary  over 
a  similar  range.  Poles  close  to  the  imaginary  axis  can  result  in  very  sharp  spectral  features. 
Thus,  the  frequency  response  may  be  difficult  to  resolve  by  discrete  sampling.  An  alternative 
approach  is  to  exploit  the  analytic  behavior  of  the  electromagnetic  model  to  construct  a  rational 
approximation  to  the  frequency  response. 

Rational  approximation  is  a  useful  method  to  capture  the  frequency  response  of  a  network, 
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but  if  the  approximation  is  carefully  constructed,  it  can  also  be  used  to  derive  a  reduced-order 
model  for  use  in  time-domain  circuit  simulation.  Suppose  a  network  containing  both  nonlinear 
elements  and  large  multiport  interconnect  networks  is  to  be  analyzed.  An  efficient  approach  is 
to  first  reduce  the  linear  interconnect  subcircuits  to  equivalent  but  much  smaller  macromodels 
which  reflect  the  salient  behaviour  of  the  original  subcircuits.  The  reduced  order  models  are 
inserted  as  a  “black  box”  into  the  remaining  circuit  which  is  then  given  to  simulation. 

Methods  based  on  rational  approximation,  such  as  asymptotic  waveform  evaluation  (AWE)  [26], 
have  been  popular  for  constructing  low-order  models  of  circuit  interconnect.  Algorithms  which 
are  based  on  moments  of  the  frequency  response  have  certain  numerical  stability  problems, 
but  recently  the  connection  between  Pade  approximation  and  the  Lanczos  algorithm[27]  has 
been  exploited  to  obtain  orthogonalized  Krylov-subspace  algorithms  which  can  stably  form 
high-order  rational  approximants  to  the  frequency  response  of  lumped  linear  systems[28,  29]. 
However,  one  advantage  of  moment  based  approaches  is  that  they  can  construct  approxima¬ 
tions  to  systems  with  irrational  system  response  (e.g.,  a  system  with  delay  elements  such  as 
transmission  lines  or  the  retardation  factors  of  a  full-wave  electromagnetic  model)  almost  as 
easily  as  for  a  rational  response[30,  31],  whereas  Krylov-subspace  model-reduction  algorithms 
for  systems  with  irrational  response  have  yet  to  be  demonstrated. 

We  begin  our  exposition  in  Chapter  2  by  discussing  some  fundamental  problems  in  approxi¬ 
mation  of  potential  functions.  This  discussion  leads  to  the  development  of  the  precorrected-FFT 
algorithm  in  in  Chapter  3.  Analysis  of  the  computational  complexity  of  the  algorithm,  for  sim¬ 
ple  examples,  takes  place  in  Chapter  4.  In  Chapter  5  we  discuss  application  of  the  algorithm 
to  free-space  Laplace  problems.  Extensions  to  Laplace  problems  in  stratified  geometries  and 
to  the  scalar  Helmholtz  equation  follow  in  Chapters  6  and  Chapter  7  respectively.  Application 
to  full-wave  electromagnetic  analysis  takes  place  in  Chapter  8.  Chapter  9  deals  with  efficient 
extraction  of  frequency  response  functions  in  the  context  of  full-wave  analysis.  Finally,  in 
Chapter  10,  we  discuss  some  possible  extensions  of  the  algorithms  presented. 
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Approximation  of  potential 

functions 


In  this  chapter  the  problem  of  approximating  the  potential  generated  by  a  fixed  charge 
distribution  is  considered.  We  derive  error  bounds  for  a  collocation-grid-projection  scheme 
tuned  for  use  in  multilevel  methods  for  solving  boundary-element  discretizations  of  potential 
integral  equations.  This  projection  scheme  will  be  used  in  the  algorithm  developed  in  Chapter 
3. 

2.1  The  Approximation  Problem 

In  order  to  perform  an  n-charge  potential  computation  in  less  than  quadratic  complexity,  it  is 
clear  that  some  of  the  potential  interactions  must  be  approximated.  The  form  of  approximation 
chosen  will  determine  many  of  the  features  of  the  ultimate  potential  computation  algorithm. 

Consider  first  the  approximation  of  the  potential  ^  of  a  single  point  charge  located  at 
the  origin.  The  most  obvious  choice  for  a  function  approximation  method  is  polynomial 
interpolation[32j.  Figure  2-1  shows  second-order  polynomial  interpolation  applied  to  the  1/r 
Laplace  kernel  at  two  observation  points.  For  sufficiently  large  r,  the  approximate  (interpo¬ 
lated)  potential  ^  is  a  good  approximation  to  the  true  potential  ^  of  a  point  charge  at  the 
origin.  The  leading  order  error  term  is 

if  h  is  the  uniform  spacing  of  the  interpolation  nodes.  This  error  term  becomes  large  near  the 
singularity  at  the  origin,  as  can  be  seen  from  the  inaccurate  representation  shown  in  the  left 
side  of  Figure  2-1. 

In  more  general  terms,  interpolation  is  the  process  of  approximating  the  function  (f>{x)  by  a 


Isotropic 

far-field 


Figure  2-1:  Left:  Interpolation  of  the  1/r  Green  function.  Interpolation  is  accurate  away 
from  the  origin  but  inaccurate  near  the  singularity.  Right:  Polar  plot  of  the  magnitude  of  the 
potential  of  a  charge  distribution  far  away  from  the  sources. 


weighted  sum  of  tabulated  values, 

<i>(x)  =  (2.2) 

i 

where  &i  are  the  interpolation  nodes,  which  we  will  refer  to  as  the  grid  points,  or  just  the  grid 
and  the  dependence  of  the  weights  Wi  on  x  and  X{  is  implied.  This  process  is  essentially  the  same 
as  the  problem  of  approximating  the  potential  of  the  point  charge  by  the  potential  of  another 
set  of  point  charges  which  lie  at  the  gridpoints,  as  can  be  seen  from  the  following  result  [6]: 

Theorem  2.1.  Given  V  e  Rmxl  is  an  operator  which  projects  charge  onto  a  grid  of  m  points, 
then  VT  may  he  interpreted  as  an  operator  which  interpolates  potential  at  grid  points  onto  charge 
coordinates;  conversely,  given  VT  €  Rlxm  is  an  operator  which  interpolates  potential  at  m  grid 
points  onto  charge  coordinates,  V  may  be  interpreted  as  an  operator  which  projects  charge  onto 
the  grid  coordinates.  In  either  case,  V  and  VT  have  comparable  accuracy. 

Proof.  Let  G(x,y)  be  the  Green  function  for  a  source  at  y,  evaluated  at  x.  Suppose  that 
a  unit  charge  at  the  point  xq  is  represented  by  the  vector  of  grid  charges  q.  The  approximate 
potential  \E'!C( y )  at  a  point  yo  is  given  by 

y'x{yo)  =  YJG{yo,xi)q  =  gTq  (2.3) 

i 

where  xL  is  the  position  of  the  ith  grid  charge  and  g  €  Rm,  <ji  =  G(yo,  Xi).  Conversely,  suppose 
there  is  a  unit  charge  at  yo,  and  the  potential  ^(xq)  at  xq  is  to  be  computed  by  interpolating 
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potentials  produced  by  this  unit  charge  at  the  grid  points  X{.  Then,  if  VT  is  the  interpolation 
operator, 

*;(*o)  =  E  VT(x0,  Xi)G(xi ,  yo).  (2.4) 

i 

For  a  symmetric  Green  function,  ^ v(xq )  =  G(xo,yo)  =  G(y o,xo)  =  ^x(yo),  and 

Y^VT(xo,Xi)G(xi,y0)  =  VTg  (2.5) 

i 

so  that 

^(*0)  "  *,(*0)  =  VTg  -  *y(x0)  =  gTV  -  ®«(y0).  (2-6) 

Now  suppose  q  is  assigned  the  value  V  in  order  to  represent  the  unit  point  charge  at  xq.  Then 

K(vo)  -  ’M2/0)  =  (gTqg)  -  *x(yo)  =  gTV  -  *,(»«,)  =  ^(*0)  -  (2-7) 

□ 


For  a  perspective  on  the  relative  efficiency  of  polynomial  interpolation,  consider  expanding 
the  potential  into  a  multipole  series [33], 


oo  l 

=  E  E  zmY‘m(0,<P) 

1=0  m=—l 


(2.8) 


where  c^m  are  the  multipole  expansion  coefficients.  Suppose  for  the  moment  that  <f>  is  the 
potential  of  a  point  charge  located  at  (a,  #',<//).  From  the  addition  theorem[33], 


1 

llx-x'll 


00  1 

=  EE  -jtt 

2=0  m=~l  T> 


(2.9) 


where  x  =  (r,  0 ,  <f>),x*  =  (r7, 0',  </>'),  r<  denotes  the  smaller  of  r  ,  r',  and  r>  the  greater,  it  follows 
that  the  are 

ci,m  =  alYi*m(e',  (/>').  (2.10) 


If  (f>(N)  is  the  truncation  of  the  series  to  N  terms,  i.e. 

N  l 


HN)  =  £  E 


(2.11) 


2=0  m=—l 

then  from  bounding  the  magnitude  of  the  tail  of  the  series,  it  follows  that 


I  ([ 

a\N+l) 

r  V 

r)  j 

(2.12) 


Using  order-p  polynomial  approximation  to  choose  interpolation  weights  gives  errors  of  order 
(a/r)p+1  and  requires  (p  +  l)3  coefficients.  Clearly,  this  is  not  the  most  efficient  approximation 
scheme  available,  as  approximations  using  multipole  expansions  achieve  order  (a/r)p+1  accu¬ 
racy  with  only  (p  +  l)2  coefficients.  The  convenience  of  polynomial  interpolation,  however,  is 
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that  only  the  tabulated  function  values  on  equally  spaced  nodes  are  required  for  the  interpola- 
tion/anterpolation  operation,  as  opposed  to,  e.g.,  multipole  coefficients,  which  can  be  awkward 
to  compute  for  complicated  source  geometries.  It  seems  reasonable  to  ask  if,  for  harmonic 
functions,  approximation  schemes  combining  the  advantages  of  both  approaches  exist.  That  is, 
we  wish  to  obtain  a  scheme  which  uses  tabulated  potentials  at  equally  spaced  nodes  but  has 
better  than  polynomial  accuracy  for  functions  satisfying  the  Laplace  equation. 

2.2  The  Grid-Potential-Collocation  Approach 

Consider  approximating  the  potential  of  a  charge  distribution  p(x)  by  a  set  of  Nq  point 
charges,  Qj,j  =  1 . . .  Nq  which  are  positioned  at  points  Xj.  Suppose  also  that  both  the  point 
charges  and  the  charge  distribution  lie  entirely  inside  a  sphere  of  radius  a  centered  at  the 
origin.  We  will  require  that  the  potential  of  the  point  charges  and  the  potential  of  the  true 
charge  density  match  at  a  set  of  T  <  Nq  collocation  points  xCiT,  r  =  1 . . .  T  on  a  closed  surface 
which  encompasses  the  sphere  of  radius  a.  That  is,  for  each  r, 

Y^QjG(xj,xCtr)  =  J  p(x')G(x',xCjT)dx'  (2.13) 

j 

where  G(x,x')  is  the  relevant  Green’s  function.  It  will  be  convenient  if  the  surface  is  chosen 
to  be  a  sphere  of  radius  rc  >  a,  and  the  collocation  points  are  chosen  to  be  the  abscissas 
of  a  quadrature  rule  on  the  sphere.  Integration  rules  of  arbitrary  order  on  a  sphere  can  be 
constructed  by  product  techniques,  but  more  efficient  non-product  rules  exist  [34]  which  will 
generally  be  sufficient  for  our  purposes. 

To  motivate  a  scheme  for  representing  panel  charges  with  weighted  point  charges  lying  on 
a  grid,  consider  a  charge  distribution  p(x)  contained  entirely  within  some  small  volume  B.  The 
potential  outside  B  due  to  p  can  be  determined  from  the  knowledge  of  the  potential  on  a  surface 
S  surrounding  £[35].  For  example,  suppose  p  is  contained  within  a  sphere  S  of  radius  a.  For 
all  (r,  9,  <j>)  with  r  >  a,  the  potential  4>  can  be  written  as  a  multipole  expansion  series 

OO  l 

=  E  E  ifil YimiOA)  (2-14) 

1—0  m—-l 

where  the  Y[m  are  spherical  harmonics  and  the  qm  coefficients  of  expansion.  Since  the  spherical 
harmonics  are  orthogonal  on  a  sphere,  if  the  potential  is  known  on  any  sphere  S  of  radius 
rc  >  a,  the  multipole  moments  qm  can  be  computed  as 

Cim  =  rlc+1  f  dSlYCm<t>{rc,0,<j>).  (2.15) 

J  s 

The  above  observation  suggests  a  scheme  for  computing  the  grid  charges.  We  presume  that 
the  charge  whose  potential  is  to  be  approximated,  and  the  approximating  grid  charges,  lie  inside 
a  parallelipiped  denoted  a  “cell.”  Suppose  a  pxpxp  array  of  grid  charges  is  used  to  represent 
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Figure  2-2:  Potentials  tp(r  <  rc,  9,  <j>)  or  ip(f  >  rc,  0,  <j>)  for  r,f  >  a  may  be  obtained  from  the 
potential  at  r  =  rc. 


the  charge  in  the  cell.  First,  Nc  test  points  are  selected  on  the  surface  of  a  sphere  of  radius 
rc  whose  center  is  coincident  with  the  center  of  the  given  cell.  Then,  potentials  due  to  the  p3 
grid  charges  are  forced  to  match  the  potential  due  to  the  cell’s  actual  charge  distribution  (say 
m  panel  charges  q  )  at  the  test  points,  i.e. 

Pgtq  =  P^q  (2.16) 

where  Pgt  €  R NcXp3  js  the  mapping  between  grid  charges  and  test  point  potentials,  given  by 


Pf\  = 


hj 


Fi  —  xj\ 


(2.17) 


Here  x\  and  Xj  are  the  positions  of  the  i-th  test  point  and  the  j-th  grid  point,  respectively. 
pqt.  ^  *s  mappjng  between  panel  charges  and  test  point  potentials  and  is  given  by 

1 


%-L 


panel  j  \\x\  x'\ 


:da! . 


(2.18) 


Since  the  collocation  equation  (2.16)  is  linear  in  the  panel  and  grid  charge  distributions,  the 
contribution  of  the  jth  panel  in  the  cell  to  q  can  be  represented  by  a  column  vector  W.  W  is 
given  by 

W  =  [Pgtf  Pqt’j  (2.19) 

where  Pqt^  denotes  the  jth.  column  of  Pqt  and  [Pqt]^  indicates  the  generalized  Moore-Penrose 
(or  pseudo-)  inverse  [36]  of  [P9t]^ .  The  computation  of  [P5t]^  is  done  using  the  singular  value 
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FIGURE  2-3:  2-D  Pictorial  representation  of  the  grid  projection  scheme.  The  black  points  (at 
x)  represent  the  grid  charges  (q)  being  used  to  represent  the  triangular  panel’s  charge  density 
a.  The  white  points  are  the  the  points  xl  where  the  potential  due  to  the  black  point  charges 
and  the  potential  due  to  the  triangular  panel’s  charge  density  are  forced  to  match.  The  grid 
charges  approximate  the  panel  potential  outside  the  gray  region. 


decomposition.  The  error  analysis  below  will  depend  on  the  boundedness  of  W .  For  arbitrary 
order  of  approximation  and  quadrature  rules,  [ P9t ]t  is  unbounded,1  but  for  the  relatively  low 
order  rules  of  interest  in  this  thesis,  it  is  a  simple  matter  to  check  that 


K  = 


(2.20) 


remains  small.  The  grid  projection  scheme  is  summarized  in  Figure  2-3. 

The  accuracy  of  the  above  projection  scheme  hinges  on  the  proper  selection  of  the  test 
points  xl.  From  Equation  2.15,  we  expect  high  accuracy  if  the  test  points  are  chosen  to  be 
abscissas  of  a  high-order  quadrature  rule[34].  It  will  be  shown  that  the  error  in  potential  due 
to  the  grid-charge  approximation  of  a  charge  distribution  contained  within  a  sphere  of  radius 
a,  at  a  distance  r  from  the  center  of  the  distribution,  is  of  order  (a/r)(M+1^2  if  the  test  points 
are  chosen  to  be  the  nodes  of  a  quadrature  rule  accurate  to  order  M[38]. 


2.3  Error  analysis 

First  we  establish  error  bounds  for  the  approximation  of  a  panel  charge  potential  by  grid 
charges. 


1This  follows  from  results  in  [37]  and  Equation  2.29  below. 


Theorem  2.2.  Suppose  a  grid-charge  representation  of  a  charge  distribution  p(x),  of  total 
charge  Q  =  f  \p{x*)\ dxf,  lying  inside  a  sphere  S(a)  of  radius  a  centered  at  the  origin,  has  been 
constructed.  Assume  the  grid  charges  Qj  are  given  at  points  Xj,  j  =  1 . . .  Nq,  and  define 
Qg  —  Y^jLi  \QjV  R  possible  to  choose  a  radius  rc  of  the  collocation  sphere  such  that  the 
error  <fe  in  the  grid- charge  approximation  of  the  potential  in  the  k  =  0  case  satisfies 

Q  +  QG/a\^M2  +  M  +  l  Q  +  QgMm+1  1 
r  \rj  1  -  (a/rc)  r  \r )  1  -  (ajr) 

where  M  is  the  order  of  the  quadrature  rule  and  r  is  the  distance  from  the  evaluation  point 
to  the  origin. 


Proof.  The  multipole  expansion  of  potential  <f  of  the  charge  distribution  is  [33] 

OC  l  11/* 

#\M)  =  4»£  £  (2.22) 

1=0  m=— l 

Similarly,  the  multipole  expansion  of  the  grid-charge  potential  <frg(r,  9,  <fi)  is 

oo  l  i  -j  No 

#,M,*)=4>r£  £  (2.23) 

1=0  m=—l  j— 1 


Let  ( rc,0T,(f)T )  denote  the  rth  collocation  point,  r  =  1...T,  on  the  surface  of  the  sphere 
of  radius  rc.  Assume  that  the  {0T,<j)T)  are  the  abscissas  of  a  quadrature  rule  on  a  sphere  such 
that  the  rule  exactly  integrates  spherical  polynomials  of  degree  at  least  M.  Let  wT,  r  =  1 . . .  T 
denote  the  quadrature  rule  weights  corresponding  to  a  sphere  of  radius  unity. 

The  error  in  the  potential,  <j)e,  may  be  expanded  in  multipole  series, 

oo  l 

Mr, M)  =  Z  E  %Ylm(0, <£)  (2.24) 

l=0m=-l  T 


where  the  qi,rn  a  re  the  multipole  expansion  coefficients  given  by 
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— -  /  r'lp{x')YM0',  <j>')dx'  -  £  Q/jYMOj,  4>j)  • 

21 +  1  JS(a) 


(2.25) 


The  collocation  condition  can  be  used  to  put  bound  on  the  magnitude  on  the  low-order  qitTn. 
At  a  collocation  point,  the  error  in  the  potential,  <t>e{rc,  9T,  <f>r)  =  Me,  9T,  4>t)  —  <fig(rc,  9T,  <f>r )  is 
zero,  so  we  may  write 


M  l  j 

^  El  - i+iQl,mYlm{@T>  4>t)  — 

1=0  m=-l  Tc 

00  1  1  i 

4*  E  E  2/+1  l+l 

l=M+lm=-l  M  +  1  Tc 


(2.26) 
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JS(a)  j=1 
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for  r  =  1 . . .  T. 

Prom  the  identity 


J  =  6w6mm>, 

and  the  quadrature  rule  for  selecting  wT,0T,<f)T,  it  then  follows  that 

^  {8T,  <f>T)Yim{9T,  <f>T)  =  Su'Smm 


(2.27) 


(2.28) 


for  l  +  V  <M.  Therefore,  multiplying  each  side  of  (2.26)  by  wrY{,mi  (0T,  <pT)  and  summing  over 
t  leads  to  a  simplified  form, 
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(2.29) 
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The  addition  theorem  for  spherical  harmonics  states 
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2f+~i  E  Yl*<m(0',<f>')Yl,m(0’<l>)  =  Pit «*7) 


(2.30) 


m=—l 


where  7  is  the  angle  between  (O',  <j>')  and  (0,  <f>)  and  Pi  (cos  7)  a  Legendre  polynomial. 
Using  the  addition  theorem  to  collapse  the  spherical  harmonics,  we  have 

N  l' 


MN)  =  J2  E  -TTiH',m'Yvm'(0,4>)=  (2-31) 
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for  some  angles  a,  (3, 7  and  N  <  M  +  1. 

Since  |P;(cos7?)|  <  1  for  any  77, 
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(2.32) 


and  since  2J  wt  —  47T 


lWA0l<^J>r  +  i)(^y'(£ 


M+l-V 


(a/rc) 


(2.33) 


The  total  error  is  the  sum  of  <fe(N)  and  the  contribution  from  the  neglected  multipole  terms 
of  orders  >  N.  Various  bounds  can  be  obtained  by  manipulating  the  choices  of  N  and  rc  in 
Eq.  2.33.  It  is  particularly  instructive  to  consider  N  =  M,  rc  =  y/ar,  in  which  case 


a  _  rc  _  fa 
rc  r  V  r 


(2.34) 
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so  that 


\MN)\  < 


(?) 


Q  +  Qg  [a\M?1  M2  +  M  +  l 
1  -  (a/rc) 


(2.35) 


Adding  in  the  contribution  from  the  neglected  terms  in  the  error  series,  we  have  a  final 
bound  for  the  error  <^e: 


\4>e\  < 
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(2.36) 

□ 


We  now  have  the  main  result  of  this  section. 

Theorem  2.3 .  Suppose  the  potential  of  a  point  charge  is  given  by  1/r.  The  grid-based 
technique  for  evaluating,  outside  a  sphere  of  radius  rm,  the  potential  of  a  charge  density  of  total 
charge  magnitude  Q,  located  inside  a  sphere  of  radius  a,  has  error  f>e  bounded  by 


\<t>e\  <  (1  +  «)^-(^-)^1 


(M  +  l)2 

1  \/  ( ) 


(2.37) 


where  M  is  the  order  of  a  quadrature  rule  on  a  sphere  and  rm  is  the  distance  of  the  nearest 
potential-evaluation  evaluation  point  to  the  origin,  rm  >  a  and  is  defined  in  Eq.  2.20. 


Proof.  The  theorem  follows  directly  from  Theorems  2.1  and  2.2.  □ 

Theorem  2.3  provides  an  error  bound  and  thus  a  convergence  proof,  but  the  bound  given 
is  not  especially  useful  as  an  error  estimator  in  empirical  studies,  as  it  was  derived  for  rc  a 
function  of  r  and  a.  To  obtain  more  appropriate  results  it  is  necessary  to  divide  the  problem 
into  two  regions  of  study:  rc>  r  and  rc  <  r.  We  shall  see  empirically  and  analytically  that  the 
error  behavior  of  the  grid-collocation  method  is  qualitatively  different  in  the  two  regimes.  It  is 
a  simple  matter  to  obtain  the  following  from  Equation  2.33,  for  fixed  rc: 


\<t>e\< 


Q  +  Qc(<L^M^M  +  l+Q±Qc,a^l 

\rcJ  l-(a/rc)  r  \r )  1  -  (a/r) 


\f>e\< 


Q  +  Qc  ■  rc>r  (239) 

\rj  1  -  (a/rc)  r  \rj  1  -  (a/r) 


As  a  function  of  r,  we  expect  the  errors  to  decrease  sharply  only  until  r  ~  rc,  afterwards  it 
is  the  aliasing  errors  that  dominate  the  approximation. 
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2.4  Helmholtz  Kernels 


Consider  now  approximating  the  potential  of  a  point  source  with  Green  function  elkr  jr. 
Figure  2-4,  analogous  to  2-1,  shows  interpolation  of  such  a  kernel.  On  the  left  of  Figure  2-4  we 
see  that  the  introduction  of  the  new  length  scale  A  =  27r/A;  has  made  the  interpolation  inaccurate 
away  from  the  singularity,  because  the  node  spacing  is  insufficiently  small  compared  to  A.  On 
the  right  is  shown  the  intensity  of  the  potential  for  one  possible  configuration  of  sources,  for  a 
large  value  of  ka  where  a  is  the  dimension  of  the  source  region.  We  see  that  the  potential  can 
take  on  arbitrarily  complicated  patterns  far  from  the  sources.  This  is  different  from  the  1/r 
case  where,  sufficiently  far  from  a  source  region,  the  potential  is  always  isotropic  (the  potential 
tends  to  the  1/r  monopole).  To  accurately  approximate  the  potential  these  patterns  must  be 
resolved. 

It  is  important  to  distinguish  between  the  one-  and  many-  dimensional  cases  when  discussing 
accuracy  of  various  interpolation  schemes.  In  one  dimension,  polynomial  interpolation  requires 
p  +  1  coefficients  to  achieve  order-p  accuracy.  In  three  dimensional,  polynomial  interpolation 
can  be  done  by  a  sequence  of  one-dimensional  interpolations,  and  just  as  for  the  1/r  kernel, 
(p  +  l)3  coefficients  are  needed  for  order-p  accuracy.  But  again,  using  multipole  expansions, 
(p  +  l)2  coefficients  are  needed  for  order-p  accuracy.  The  exponent  of  two  comes  from  the  fact 
that  the  multipole  expansions  efficiently  represent  the  far-field  radiation  patterns,  which  live 
on  a  two-dimensional  surface.  Polynomials,  in  contrast,  must  sample  the  three-dimensional 
fields  on  a  scale  of  A.  It  is  hoped  that  the  grid-collocation  method  can  preserve  the  efficiency 
of  multipole  representations  by  matching  the  far-field  data.  In  this  section,  we  will  show  that, 
just  as  for  the  1/r  case,  the  grid-collocation  method  obtains  accurate  approximations  to  the 
low  order  multipole  coefficients,  which  is  equivalent  to  low-order  interpolation  of  the  far-field 
patterns.  Section  2.5  discusses  modifications  to  the  grid-collocation  method  that  directly  utilize 
the  far-field  data. 

In  the  previous  section  we  obtained  error  bounds  for  the  grid-collocation  technique  using 
a  three-step  process.  The  first  step  was  to  bound  the  sum  of  a  tail  of  a  multipole  expansion 
series.  The  second  was  to  use  the  collocation  condition  to  obtain  an  expression  for  the  multipole 
coefficients  of  the  error  in  terms  of  a  series  tail.  Finally,  by  bounding  the  sum  of  this  “aliasing” 
error  due  to  the  errors  in  the  low-order  multipole  coefficients,  an  error  bound  for  the  grid- 
collocation  technique  was  obtained.  A  similar  approach  can  be  followed  when  the  kernel  is 


eikr  jr. 


In  the  low-frequency  limit,  k 


Z  Hjn(z) 
zn+1yn(z) 


0,  the  asymptotic  expressions[39] 

1 


(2.40) 


1  •  3  •  5 . . .  (2n  +  1) 

— 1  ■  3  •  5. . .  (2n  —  1)  (2.41) 

make  the  extension  trivial.  Additionally,  the  spherical  Bessel  functions  are  known  to  approach 
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Far-field 

radiation 

patterns! 


Figure  2-4:  Left:  Interpolation  of  the  real  part  of  the  etkr /r  Green  function.  Interpolation 
#  is  inaccurate  near  the  singularity  as  in  the  1/r  case,  but  may  also  be  inaccurate  away  from 

the  singularity  if  the  spacing  of  the  interpolating  points  is  large  compared  to  the  characteristic 
wavelength  2n/k.  Right:  Polar  plot  of  the  magnitude  of  the  potential  of  a  hypothetical  charge 
distribution  in  the  far-field  limit. 

0  the  asymptotic  limits[39,  18] 
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71 — ►OO 

2(2„+l)"« 
3ny  znen+1/2 

=  1 

(2.42) 

• 

lim 

n—>oo 

pn-\~\/2 

hn(z)  * 

V  v2(2n  +  l)n 

=  1 

(2.43) 

and  from  these  expressions  we  can  draw  similar  conclusions,  as  long  as  the  order  of  the  quadra¬ 
ture  rule  is  sufficiently  large.  For  larger  kL,  with  L  the  size  of  the  computational  region,  and 
moderate  order  M,  however,  the  situation  is  more  difficult. 

The  following  bounds,  which  can  be  obtained  from  10.1.13  and  10.1.16  in  [39],  will  be  needed: 


i*-wi  s 

(2.44) 

• 

(2.45) 

Consider  first  the  error  present  in  truncating  a  multipole  expansion  of  a  Helmholtz  field  to  N 
terms.  As  is  the  usual  case,  suppose  that  outside  a  sphere  of  radius  a,  a  function  satisfying 
the  Helmholtz  equation  is  represented  by  a  multipole  expansion  whose  moments  up  to  order  N 
vanish 

OO  l 

xp(r,e,4>)=  Y  E  Pi,mhf\kr)Ylm(e,cj>)  (2.46) 

Z=JV+ 1  m——l 
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where  k  is  the  wavenumber  and  h\^  =  ji(kr)  +  iyi(kr)  is  the  first-kind  spherical  Hankel  function 
of  order  l. 

Theorem  2Jf.  For  N  >  ka  and  any  r  >  a,  there  exist  constants  Ci,C2  >  0,  independent  of 
a,  r  and  k,  such  that 

/  ? _  \  AM-1 

(2.47) 


IVfoM)!  <afyN+1  +  c2 


Proof.  Follows  directly  from  Eqs.  2.44  and  2.45. 


□ 


The  second  term  occurs  due  to  the  fact  that,  regardless  of  how  far  the  potential  evaluation 
point  is  from  the  source,  the  complex  far-field  radiation  pattern  must  be  resolved.  We  expect 
that  as  long  as  the  order  of  the  quadrature  rule  used  in  the  grid-collocation  technique  is  larger 
than  about  2 ka,  it  should  be  possible  to  accurately  determine  all  the  significant  multipole 
coefficients  of  the  potential. 

Proposition  2.5.  Suppose  the  potential  of  a  charge  is  given  by  elkr  fr.  If  the  collocation  points 
in  the  grid-charge  assignment  are  chosen  to  be  the  abscissas  of  a  quadrature  rule  which  exactly 
integrates  spherical  harmonics  of  order  <  2k  a,  i.e., 


M  >  2 ka 


(2.48) 


for  a  quadrature  rule  of  order  M,  then  the  grid-based  technique  for  evaluating,  outside  a  sphere 
of  radius  rm,  the  potential  of  a  charge  density  of  total  charge  magnitude  Q,  located  inside  a 
sphere  of  radius  a,  has  error  <f>e  bounded  by 


\4>e\  <  Ci(l  + 

rm  rm 


M  +  l 
2 


(M  +  l)2 


1  -  s/a/rn 


+  C2 


/  ka  \M+1 
\M  +  1/ 


(2.49) 


for  some  constants  ci,c2  independent  of  a,r,k. 


Proof.  ( Sketch )  Given  the  conditions  of  Theorem  2.4,  the  proof  follows  similarly  to  the  proof 
of  Theorem  2.3. 

The  multipole  expansion  of  potential  <f>  of  the  charge  distribution  is  [33] 


OO  l  p 

Hr ,  6, 4>)  =  4t vik  £  £  h,(*r)[  /  Mkr^pix')^',  ct>')dx']Ylrn{0,  <f>).  (2.50) 

l=0m=-l  ■/5(°> 

and  the  multipole  expansion  of  the  grid-charge  potential  <f>g  (r,  9,  <j>)  is 

OO  l  Nq 

<j>g(r,0,<t>)=4irikJ2  E  Hkr^Qjjtikr^Y^HWUe^H-  (2.51) 
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The  condition  that  the  error  <£e(rc,  0T ,  <j>T)  =  </>(rc,  0T,  <j)T )  —  (f>g{rc ,  0T>  <Ar)  in  the  potential  at 
a  collocation  point  (rc,  0T,  </>T)  is  zero  is 


M  l 

EE  hi  {0  r  5  <pr)  — 

/=0  m=—l 


(2.52) 


II  /■ 
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l=M+ 1  m=-J  j=l 


for  r  =  1 . . .  T,  where  r/;  m  is  given  by 


From  this  the  low-order  multipole  coefficients  qv  jn'  of  the  error  are, 


,  Nq 

qi,m=  47T ik  jl{kr')p{xl)Yirn(6l ,  4>')dx>  -  E  •  (2-53) 

JS(  a) 


hi'(krc)qi',m'  = 


(2.54) 
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QjJl(krj)Yim{Oj‘>  <t>j)  ^>r)^m(^T5  0t) 

j=i 

and  after  collapsing  the  spherical  harmonics,  the  expression  for  the  error  in  the  potential  <£e(iV) 
due  to  the  low-order  multipole  coefficients  is 

N  V 

0e(^O  —  y  ^  ^  hif(kr)qitiTnfYitmt  (0,  (f>)  = 
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for  some  angles  a,  /?,  7. 

Thus,  the  expression  analogous  to  Eq.  2.32  which  must  be  bounded  is 

\MN)\  <  E  E  (2/'  +  !)(2;  + 

If  we  again  suppose,  for  example,  N  =  M,rc  =  y/ar ,  in  which  case 

a  tv  fa 


(2.56) 


tv  r  V  r 


so  r  >  rc,  and  additionally  require  krc  >  M,  then  it  is  straightforward  to  show 


^  <4  (f) 


M±1  .  ,  .  M±i 

2  .  f  ka  \  2 

+  C2(mTi) 


(2.57) 


(2.58) 


for  some  Cj .  c^. 
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2.5  Utilizing  the  far-field  expressions 

Potentials  generated  by  a  Helmholtz  kernel  possess  structure  in  the  far-field  (r  — *  oo)  limit 
that  can  be  used  to  represent  the  potentials  in  much  the  same  way  as  the  multipole  coefficients. 
Essentially,  the  multipole  coefficients  are  Fourier  coefficients  for  the  expansion  of  the  far-field 
functions  in  a  basis  of  spherical  harmonics.  The  far-field  potential  of  a  charge  at  position  x'  is 

F(s)  =  elx'lc(x'’s)  (2.59) 


where  s  €  S2,  S2  is  the  unit  sphere,  and  c(x',  s )  is  the  cosine  of  the  angle  between  x  and  s. 

The  potential  of  the  charge  at  the  point  (r,  fl),fl  €  S2,  can  be  determined  for  r  >  \x'\  from 
the  far-field  potentials  by  [18] 

d>=  lim  /  yil(2l  +  l)hi{kr)Pl{c{Q,s))  (2.60) 

n — >oo  I  q2  L ' 

Jb  1=0 

This  suggests  that  a  good  way  to  calculate  the  grid-projection  operators  might  be  to  require 
that  the  far-field  potential  of  the  grid-charges  and  the  actual  charge  distribution  match  at  the 
collocation  points  sT  G  S2.  Let  the  far-field  of  the  error  c/)e  in  the  potential  be  given  by 


F(s)  —  E  Ql' ,rn’Yv ,m'{s) 
V  jml 


(2.61) 


From  the  definition  of  the  far-field  expansion  (Equation  2.59)  and  10.1.47  in  [39]  it  follows  that 
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*■(.)  =  E  E 

l'=0m'=-l 


-  J  P(x>)jll(kr,)Yl',m'(UJ') 


Yv,m'(s )  (2.62) 


where  Uj,u  €  S2  are  the  angular  coordinates  of  the  jth  grid-charge  a  source  charge  point. 
Then,  from  the  collocation  condition  it  follows  that 


OO  l 

qi',m'  =  E  E  E^E m(kaj)Y^(wj)  V<M  + 1 
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From  this  and  Eq.  2.45,  it  follows  in  a  manner  similar  to  the  preceding  proofs  that 


(2.63) 
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(2.64) 


2.6  Applications  and  competing  approaches 

While  the  grid  operators  described  here  were  developed  with  the  precorrected-FFT  tech¬ 
nique  in  mind,  they  can  be  incorporated  into  any  multi-level  scheme  [21,  6].  The  representation 
described  here  has  two  advantages  which  allow  it  to  be  efficient.  First,  because  of  the  regular 
spacing  of  the  grid  charges,  fast  ( 0(1 2)  log  Z,  where  l  is  the  order  of  the  quadrature  rule)  trans¬ 
lation  and  potential  evaluation  operators  exist.  It  appears  that  in  the  approach  in  [21],  only 
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the  0(l4)  direct  operators  are  available.  Secondly,  the  sharing  of  grid  charges  between  compu¬ 
tational  cells  allows  for  a  reduction  in  the  total  number  of  coefficients  needed  to  represent  the 
potential  in  each  cell  of  the  computational  domain.  That  is,  if  there  are  N  cells  in  the  domain, 
and  p3  grid  charges  are  used  to  represent  the  potential  in  each  cell,  then,  for  large  N  where 
we  may  neglect  edge  effects,  the  total  number  of  grid  charges  is  only  N(p  -  l)3,  a  significant 
reduction  for  small  p.  For  most  engineering  problems,  we  expect  p  <  5,  so  the  sharing  effect  will 
still  be  significant.  An  additional  advantage  of  the  grid-based  approach  is  that  the  potential 
throughout  the  domain  can  be  obtained  at  little  additional  cost  once  the  panel  charges  have 
been  determined  [40]. 

Many  other  approaches  to  computing  the  grid  charge  are  available.  For  an  alternative 
approach,  based  on  matching  multipole  expansion  coefficients  directly,  see  [37].  [6]  discusses 
deriving  coefficients  from  polynomial  interpolation. 


2.7  Empirical  Grid  Error  Analysis 


First  we  consider  the  worst-case  errors  in  the  grid-charge  representation,  and  compare  to 
representation  via  multipole  expansions.  For  an  order- A  multipole  expansion  (pm,  the  bound 
on  the  error  <f>  —  <f>m  in  the  approximation  is 


\<j>  4>m\  <  ®  (r) 


L+l 


i  -m  (2-65) 

while  for  a  grid-charge  representation  cp;j  ,  collocating  at  nodes  of  an  order  M  quadrature 
rule,  the  error  must  satisfy 


(2.66) 


For  an  order- M  quadrature  rule,  it  seems  reasonable  to  define  an  effective  order  of  the 
grid-charge  error  Ng  =  (M  —  l)/2.  Figure  2-5  shows  the  error  vs.  effective  order  for  multipole 
expansions  and  the  grid  scheme,  assuming  cell  interactions  up  to  first-near-neighbors  will  be 
done  exactly,  for  some  difficult  sample  charge  distributions.  The  worst-case  error  for  multipole 
expansions  appears  to  often  occur  for  a  charge  located  at  the  corner  of  the  unit  cube,  and 
an  evaluation  point  projected  onto  the  nearest  face  of  the  near-neighbor  cube  boundary.  This 
source-charge/ test-point  configuration  was  used  for  the  calculations  of  multipole  expansion 
errors  in  Figures  2-5  and  2-7.  The  sample  charge  for  the  evaluation  of  the  grid-charge  was 
placed  sometimes  at  the  corner,  and  sometimes  on  the  cube  edge  halfway  between  two  grid 
charge  points,  whichever  generated  the  largest  error,  with  the  evaluation  point  again  at  the 
nearest  point  of  the  near-neighbor  cube  boundary.  For  the  same  order,  the  multipole  and  grid- 
charge  representations  have  comparable  errors,  indicating  that  the  bound  of  Eq.  2.66  is  relatively 
loose.  Thus  to  obtain  an  idea  of  the  relative  efficiencies  of  the  two  approaches,  it  is  reasonable 
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err  vs.  order,  INN 


Figure  2-5:  Error  vs.  effective  order  for  multipole  expansions  (x)  and  grid-projection  scheme 
(*),  at  first-near-neighbors  cells.  For  an  order-M  quadrature  rule,  the  effective  order  of  the 
grid-projection  scheme  is  ( M  —  l)/2. 

to  compare  the  number  of  coefficients  per  unit  cell  required  to  obtain  a  given  order  of  accuracy. 
For  an  order-L  multipole  expansion,  (L  + 1)2  coefficients  are  needed.  For  a pxpxp  grid-charge 
representation,  there  are  (p- 1)3  charges  per  unit  cell.  The  order  of  the  approximation  depends 
on  the  quadrature  rule  used,  but  in  general  an  efficient  rule  with  order  M  ~  2[y/p3/2j  -  1  is 
available.  From  Figure  2-6,  which  shows  the  number  of  coefficients  required  for  both  schemes, 
it  is  clear  that  in  terms  of  the  effective  order,  the  grid-charge  representation  is  more  efficient 
at  low  orders.  This  is  due  both  to  the  greater  amount  of  charge-sharing  at  low  orders,  and  to 
the  availability  of  efficient  non-product  integration  rules.  Figure  2-7  shows  that  in  terms  of  the 
effective  number  coefficients  per  cell,  the  grid-based  scheme  may  have  superior  worst-case  error 
performance  up  to  at  least  order  6  or  so. 

Next,  we  consider  the  behavior  of  errors  away  from  the  charge  distribution.  Consider  placing 
a  single  point  charge  in  the  basic  computational  cell  and  using  the  various  projection  schemes 
to  approximate  its  potential  via  sets  of  point  charges  at  the  regular  grid  points.  If  the  cell  is  a 
cube  with  side  length  d,  empirical  analysis  suggests  that  the  grid-collocation  projection  scheme 
has  largest  error  when  the  point  charge  is  located  on  the  face  of  the  cube  with  coordinates 
(d/2) (1  —  1  /(p  —  1),  1, 1  —  1  /{p  —  1)),  with  the  evaluation  point  taken  at  a  point  along  a  line 
parallel  to  the  y- axis  and  intersecting  the  test  charge,  i.e.  (d/2) (1 — 1/ (p-1),  1+Ay,  1  —  1/ (p-1)), 
for  some  increment  Ay.  Figures  2-8  to  2-11  show  errors  at  various  values  of  rc  and  kd ,  with 
k  the  wavenumber.  Errors  for  the  use  of  polynomial  interpolation  and,  for  k  >  0,  the  far-field 
collocation  approach  are  also  shown. 

Consider  first  the  (b)  plots,  where  rc  is  small.  For  all  orders  of  approximation  the  error 
decays  slowly  away  from  the  charge  distribution.  Since  in  this  case  rc  ~  rmin,  from  the  error 
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Figure  2-6:  Effective  number  of  coefficients  per  cell  vs.  effective  order  for  multipole  expansions 
(x)  and  grid-projection  scheme  (*). 


Figure  2-7:  Error  vs.  effective  number  of  coefficients  per  cell  for  multipole  expansions  (x)  and 
grid-projection  scheme  (*),  at  first-near-neighbor  cells. 
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V 

Rule  Order  M 

Source 

max  kd 

P  =  2 

3 

[41] 

1.5 

P  =  3 

7 

[41] 

3 

p  =  4 

11 

[41] 

5 

p  =  5 

17 

[42] 

8 

p  =  6 

23 

[43] 

10 

Table  2-1:  Quadrature  rules  used  for  the  grid-projection  scheme  experiments. 


analysis  of  Eqs.  2.38  and  2.39,  we  expect  the  error  to  behave  essentially  as  a  monopole,  dying 
slowly  away  from  the  origin,  regardless  of  the  order  of  the  quadrature  rule.  We  only  expect 
the  order  of  quadrature  rule  to  change  the  constant  factor  in  front  of  the  error  term.  Notice 
that  in  general,  when  rc  is  increased  in  the  (c)-(e)  plots,  the  worst-case  errors  do  not  change 
drastically,  again  as  is  predicted  by  our  previous  analysis.  The  variation  of  error  with  distance, 
however,  changes  considerably.  As  the  collocation  sphere  radius  is  increased,  the  magnitude  of 
the  low  order  multipole  coefficients  of  the  error  decreases,  and  the  errors  decay  rapidly  with 
distance.  Note  that  the  sharp  error  decay  associated  with  high  order  multipole  approximation 
ends  at  about  the  collocation  sphere  radius. 

As  k  is  increased  in  Figures  2-9  to  2-11,  the  various  approximation  methods  all  become  less 
accurate,  and  the  low-order  schemes  become  totally  inaccurate.  Table  2-1  shows  the  expected 
value  of  kd  at  which  the  collocation  scheme  should  fail  for  each  order,  based  on  the  M  >  2k d 
criterion.  Our  results  are  in  good  agreement  with  these  estimates.  In  particular  the  p  =  6 
collocation  scheme  retains  good  accuracy  even  at  the  highest  frequency  considered.  Since  the 
order  of  the  collocation  rule  (23  in  this  case)  is  much  greater  than  2 kd,  we  expect  the  errors 
to  behave  similarly  to  the  1  jr  behavior,  and  this  indeed  appears  to  be  the  case.  Note  that  at 
kd  =  5,  the  basic  computational  cell  is  nearly  a  wavelength  long. 

We  also  note  that  the  collocation  schemes  are  usually,  but  not  always,  more  accurate  than 
polynomial  based  projection.  It  is  our  experience  in  general  that,  at  low  order,  the  collocation- 
based  schemes  have,  on  average,  errors  20%-40%  smaller  than  polynomial-based  approxima¬ 
tions.  The  collocation-based  schemes  are  particularly  advantageous  in  terms  of  worst-case 
errors  at  high  wavenumber,  compare,  e.g.,  Figure  2-ll(a)  and  (f),  where  the  collocation  scheme 
can  by  superior  be  more  than  an  order  of  magnitude. 

Table  2-2  provides  a  final  perspective  on  the  projection  schemes.  It  gives  the  errors  for  a 
point  charge  positioned  as  for  the  Figures  2-8  to  2-11,  with  evaluation  points  taken  at  the  edge 
of  a  first-near-neighbor  cube  and  second-near-neighbor  cube.  It  is  expected  that  these  should 
be  close  to  the  worst-case  approximation  errors. 


# 
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(a)  polynomial,  kd  =  0 


-2 


(b)  rc  =  1.5c?,  kd  =  0 


Figure  2-10:  Error  in  grid  approximations.  From  bottom  to  top,  p  =  6,5 
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3.42e-06 

rc  =  1.5  d 

1 

119' 

0.00050 

4.89e-05 

0.083 

0.00366 

0.00011 

3.28e-06 

6.13e-06 

rc  =  2d 

1 

0.26 

0.022 

0.00044 

6.94e-06 

0.075 

0.00179 

0.00012 

2.69e-06 

1.97e-07 

rc  =  5  d 

1 

0.24 

0.018 

0.00062 

0.00012 

0.066 

0.00090 

8.95e-05 

4.44e-06 

2.23e-07 

rc  =  8  d 

1 

0.24 

0.018 

0.00057 

0.00028 

0.065 

0.00085 

8.74e-05 

5.62e-06 

2.52e-06 

far-field 

1 

0.24 

0.018 

0.00053 

0.00026 

0.063 

0.00082 

8.64e-05 

6.01e-06 

2.83e-06 

Poly 

2.5 

0.53 

0.052 

0.0118 

0.00361 

0.00016 

0.15 

0.0067 

0.00202 

0.00023 

2.8e-05 

rc  =  1.5  d 

2.5 

0.98 

0.065 

0.0025 

0.00070 

6.3e-05 

0.49 

0.0165 

0.00090 

1.01e-05 

2.74e-05 

rc  =  2d 

2.5 

0.92 

0.052 

0.0027 

0.00062 

5.29e-06 

0.46 

0.0125 

0.00065 

1.06e-05 

1.32e-06 

LO 

II 

2.5 

0.84 

0.042 

0.0026 

0.00052 

8.82e-05 

0.41 

0.00865 

0.00050 

1.08e-05 

2.17e-07 

"8 

00 

II 

o 

8- 

2.5 

0.83 

0.040 

0.0026 

0.00053 

8.87e-05 

0.40 

0.00793 

0.00048 

l.le-05 

2.27e-07 

far-field 

2.5 

0.80 

0.039 

0.0026 

0.00052 

9.74e-05 

0.39 

0.00693 

0.00046 

1.09e-05 

1.49e-07 

Poly 

5 

0.98 

0.17 

0.0454 

0.0189 

0.00318 

0.29 

0.0256 

0.00785 

0.00167 

0.00026 

rc  =  1.5d 

5 

3.54 

0.55 

0.0243 

0.0034 

0.00018 

2.1 

0.193 

0.022 

0.00084 

0.00012 

r  c  =  2d 

5 

5.11 

0.50 

0.0237 

0.0029 

9.19e-05 

2.9 

0.175 

0.0216 

0.00077 

1.49e-05 

rc  =  5  d 

5 

13.5 

0.45 

0.0204 

0.0026 

0.00053 

7.2 

0.151 

0.0185 

0.00074 

4.84e-06 

II 

00 

5 

19.3 

0.43 

0.0196 

0.0025 

0.00015 

10.2 

0.144 

0.0176 

0.00075 

1.13e-05 

far-field 

5 

50.1 

0.40 

0.0185 

0.0025 

5.44e-05 

28.7 

0.132 

0.0163 

0.00077 

1.49e-05 

Table  2-2:  Maximum  errors  at  first-  and  second- near-neighbor  cubes  for  various  grid-projection 
schemes. 
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The  Precorrected-FFT  Algorithm 


3.1  The  Basic  Idea 

To  develop  a  faster  approach  to  computing  the  matrix- vector  product,  consider  the  paral¬ 
lelepiped  which  contains  a  three  dimensional  problem  after  it  has  been  discretized  into  n  panels. 
The  parallelepiped  containing  the  problem  could  be  subdivided  into  an  k  x  l  x  m  array  of  small 
cubes  so  that  each  small  cube  contains  only  a  few  panels.  Figure  3-1  (a)  shows  a  discretized 
sphere,  with  the  associated  space  subdivided  into  a  3  x  3  x  3  array  of  cubes.  These  small  cubes 
are  be  the  cells  referred  to  in  the  previous  chapter. 

The  precorrected-FFT  algorithm  is  motivated  by  the  approximation  scheme  discussed  in  the 
previous  chapter.  Potentials  at  evaluation  points  distant  from  a  cell  can  be  accurately  computed 
by  representing  the  given  cell’s  charge  distribution  using  a  small  number  of  weighted  point 
charges.  If  the  point  charges  all  lie  on  a  uniform  grid,  for  example  at  the  cell  vertices,  then  the 
computation  of  the  potential  at  the  grid  points  due  to  the  grid  charges  is  a  discrete  convolution 
which  can  be  performed  using  the  FFT.  Figure  3-1  (b)  show  a  possible  set  of  grid  charge  for  the 
cell  subvisions  shown  in  Figure  3-1  (a).  Thus,  a  four-step  method  for  approximating  Pq  is 

1.  project  the  panel  charges  onto  a  uniform  grid  of  point  charges, 

2.  compute  the  grid  potentials  due  to  grid  charges  using  an  FFT, 

3.  interpolate  the  grid  potentials  onto  the  panels,  and 

4.  directly  compute  nearby  interactions. 

This  process  is  summarized  in  Figure  3-2.  We  emphasize  that  the  grid  of  point  charges  is 
introduced  purely  as  a  computational  aid,  it  is  not  related  to  the  underlying  discretization  of 
the  conductors. 

Given  a  set  of  M  cells  which  contain  the  set  of  n  panels  and  define  the  n  grid  points,  we 
now  describe  how  to  compute  the  vector  of  potentials  ip  E  Rn  from  the  vector  of  panel  charges 


Figure  3-1:  (a)  Side  view  of  a  sphere  discretized  into  320  panels,  with  spatial  decomposition 
into  a  3  x  3  x  3  array  of  cells,  (b)  Superimposed  grid  charges  corresponding  to  the  cell  decom¬ 
position  of  (a),  with  p  =  3.  In  each  cell,  a  3  x  3  x  3  array  of  grid  charges  is  used  to  represent 
the  long  range  potential  of  the  charged  panels  in  the  cell.  Some  of  the  grid  charges  are  shared 
among  cells.  Note  that  the  grid  is  “coarser”  than  the  triangular  panels  used  to  discretize  the 
sphere.  The  grid  extends  outside  the  problem  domain  because  the  number  of  grid  points  is  • 

required  to  be  a  factor  of  two. 


Figure  3-2:  2-D  Pictorial  representation  of  the  four  steps  of  the  precorrected-FFT  algorithm. 
Interactions  with  nearby  panels  (in  the  grey  area)  are  computed  directly,  interactions  between 
distant  panels  are  computed  using  the  grid. 
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q  E  R”.  4>g  £  Rn  will  denote  the  contribution  of  the  grid  charges  to  the  potentials  on  the  n 
panel  charges.  n(k )  denotes  the  number  of  panels  in  a  cell  k,  q(k)  €  the  restriction  of  the 

charge  vector  q  to  the  indices  whose  corresponding  panels  lie  in  cell  k  and  'ip(k)  E  R71'^'  denotes 
the  similar  restriction  of  the  potential  vector,  p  denotes  the  order  of  grid  approximation,  q  E  Rn 

A  3  ^  3 

is  the  vector  of  grid  charges,  ip  the  vector  of  grid  potentials,  and  q(k)  E  Rp  ,  ip(k)  E  Rp  denote 
the  restriction  of  q  and  ip  respectively  to  grid  points  of  cell  k.  We  define  N(k)  to  be  the  indices 
of  the  set  of  cells  which  are  “near”  cell  k.  W,  V  :  Rn  — ►  Rn,  yet  to  be  defined,  will  refer  to 
linear  operators  which  project  n  uniformly  distributed  panel  charges  to  the  n  grid  points,  and 
the  linear  operator  H  :  Rn  — >  Rn  gives  grid  potentials  in  terms  of  grid  charges,  i.e.  ip  =  Hq. 
W(k,j )  :  R  — >  Rp3  is  the  nonzero  part  of  W  corresponding  to  charge  j  in  cell  k,  1  <  j  <  n(k); 

3  3 

V(k,j )  is  the  similar  part  of  V,  and  H(k,l)  :  Rp  — ►  Rp  is  the  block  of  H  which  maps  grid 
charges  q(l)  of  cell  l  to  grid  potentials  of  cell  k,  ip(k)  =  H(k,l)q(l).  A  subscript  indicates  an 
index  into  a  matrix  or  vector,  e.g,  qj{k)  is  the  jth  entry  of  vector  q(k). 

3.2  Projecting  onto  a  grid 

The  first  step  in  the  description  of  the  algorithm  is  to  describe  the  construction  of  the 
operator  W.  As  was  discussed  in  detail  in  Chapter  2,  for  panel  charges  contained  within  a 
neighborhood  of  a  given  cell,  the  potentials  at  evaluation  points  distant  from  the  given  cell 
can  be  accurately  computed  by  representing  the  given  cell’s  charge  distribution  with  a  small 
number  of  appropriately  weighted  point  charges  on  a  uniform  grid  thoughout  the  given  cell’s 
volume.  Figure  3- 1(b)  shows  the  grid  imposed  on  the  cell  structure  of  Figure  3- 1(a)  when  a 
3x3x3  array  of  grid  charges  is  used  to  represent  the  charge  in  each  cell.  Note  that,  because 
the  grid  is  only  used  to  represent  the  long-range  part  of  the  panel  potentials,  the  grid  may  be 
significantly  coarser  than  the  actual  problem  discretization. 

For  any  panel  charge  j  in  cell  k,  the  projection  operation  generates  a  subset  of  the  grid 
charges  q(k).  To  obtain  each  panel’s  contribution  to  q,  it  is  necessary  to  calculate  Pgt  and  [Pgt]  ^ 
Since  Pgt  is  small  and  is  the  same  for  each  cell,  its  singular  value  decomposition  need  only  be 
performed  once,  and  so  the  relative  computational  cost  of  calculating  [Pgt]  ^  is  insignificant. 
The  final  contribution  to  q(k)  from  the  charges  in  cell  k  is  generated  by  summing  over  all 
the  charges  in  the  cell.  Note  that  panel  charges  outside  cell  k  may  contribute  to  some  of  the 
elements  of  q(k)  in  the  case  of  shared  grid  charges. 


3.3  Computing  Grid  Potentials 


Once  the  charge  has  been  projected  to  a  grid,  the  operation  H,  computing  the  potentials  at 
the  grid  points  due  to  the  grid  charges,  is  a  three-dimensional  convolution.  We  denote  this  as 

=  Hq=  Yj  M* -i'J  ~j\k  ~  k')q{i' ,j' ,k')  (3.1) 

where  i,j,  k  and  k'  are  triplets  specifying  the  grid  points  and  h(i  -  i',j  —  j',  k  -  k ')  is  the 
inverse  distance  between  grid  points  i,j,  k  and  i',j',k'.  As  will  be  made  clear  below,  /t(0, 0, 0) 
can  be  arbitrarily  defined,  and  is  set  to  zero.  The  above  convolution  can  be  rapidly  computed 
by  using  the  Fast  Fourier  Transform  (FFT).  In  practice,  each  convolution  requires  one  forward 
and  one  inverse  three-dimensional  FFT.  The  discrete  Fourier  transform  of  the  kernel  matrix  H, 
denoted  H,  need  be  computed  only  once. 

An  efficient  FFT  implementation  is  central  to  the  performance  of  the  precorrected-FFT 
algorithm.  The  FFT  is  a  very  well-studied  algorithm,  and  many  possible  implementations  exist. 
Most  FFT  implementations  have  a  fairly  regular  nature,  and  thus  very  efficient  optimized  code 
can  be  developed.  Also,  the  structure  of  the  data  in  a  multidimensional  convolution  can  be 
exploited  for  additional  performance  gains.  For  example,  the  use  of  the  FFT  to  perform  a 
linear  multidimensional  convolution  involves  embedding  the  data  ( q )  to  be  transformed  into  a 
larger  data  space,  much  of  which  is  zero.  The  fact  that  much  of  the  transformed  data  is  zero 
can  be  exploited  to  yield  a  more  efficient  transform.  In  comparison,  achieving  optimal  machine 
performance  with  fast  multipole  algorithms  is  more  difficult,  due  to  the  less  regular  nature  of 
the  algorithms. 

3.4  Interpolating  Grid  Potentials 

Once  the  grid  potentials  have  been  computed,  they  must  be  interpolated  to  the  panels  in 
each  cell.  When  a  collocation  scheme  is  used  to  discretize  the  integral  equation,  the  operator 
which  interpolates  potential  at  grid  points  in  cell  k  to  a  charge  j  also  in  cell  k  is  not  [W(k,j)]T 
defined  in  Eq.  2.19.  Instead,  the  projection  operator  V(k,j)  for  a  point  charge  located  at  the 
collocation  point  is  computed  which  gives  the  interpolation  operator  [V(k,  j)]T .  However,  if  a 
Galerkin  scheme  is  used  in  the  discretization  then  the  interpolation  operator  is  [W(k,j)]T. 

Thus,  projection,  followed  by  convolution,  followed  by  interpolation  gives  the  grid-charge 
approximation  tpa  to  the  potentials  which  can  be  represented  as 

V-g  =  VTHWq.  (3.2) 

If  Galerkin  methods  are  used,  Eq.  3.2  becomes 

ipG  =  WT  HW  q  (3.3) 
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and  therefore  the  precorrected-FFT  method  preserves  the  symmetry  of  the  Galerkin  discretiza¬ 
tion  for  free-space  problems. 


3.5  Precorrecting 

The  difficulty  with  the  above  three  steps  is  that  the  calculations  using  the  FFT  on  the  grid  do 
not  accurately  approximate  the  nearby  interactions.  In  ^  of  (3.2),  the  portions  of  Pq  associated 
with  neighboring  cell  interactions  have  already  been  computed,  though  this  close  interaction 
has  been  poorly  approximated  in  the  projection/interpolation.  A  more  accurate  calculation  of 
interactions  between  nearby  panels  is  needed,  but  it  is  also  necessary  to  remove  or  avoid  the 
inaccurate  contribution  from  the  use  of  the  grid.  This  is  a  general  difficulty  with  grid-based 
potential  calculation  methods,  and  a  variety  of  correction  methods  have  been  proposed[44, 12,  6], 
the  details  of  which  usually  depend  on  the  problem  being  solved,  the  interpolation  scheme,  and 
the  nature  of  the  grid  solver. 

Because  our  algorithm  works  directly  with  the  Green  function,  and  because  the  iterative 
solver  requires  that  many  potential  evaluations  are  performed  for  a  given  panel  configuration,  it 
is  possible  to  treat  nearby  panel  interactions  exactly,  without  sacrificing  algorithmic  efficiency. 
We  accurately  treat  interactions  between  panels  close  together  by  modifying  the  way  nearby 
interactions  are  computed,  a  step  we  refer  to  as  precorrection. 

In  particular,  denote  as  P(k,  l)  the  portion  of  P  associated  with  the  interaction  between 
neighboring  cells  k  and  l,  V(k)  and  W(l)  the  matrices  formed  from  the  columns  V(k,j)  and 
W(k,j)  respectively,  and  denote  ip(k\i)  as  the  panel  potentials  in  cell  k  due  to  the  charges  qi  in 
cell  l.  Then 

1>G,m  =  V(k)TH(k,l)W(l)qi  (3.4) 

is  the  grid-approximation  to  which  is  inaccurate.  Subtracting  this  approximation  and 

then  adding  the  correct  contribution, 

$(k\l)  =  ^G,(k\l)  +  (Pk,i  -  V(k)THk}lW(l))  qh  (3.5) 

produced  the  accurate  result  P(k,l)qi. 

This  may  be  efficiently  accomplished  by  defining 

P(k ,  l)  =  P(k,  l )  -  V(k)TH{k,  l)W(l)  (3.6) 

to  be  the  “precorrected”  direct  interaction  operator.  When  used  in  conjunction  with  the  grid 
charge  representation  P(k ,  l)  results  in  exact  calculation  of  the  interactions  of  panels  which 
are  close.  Assuming  that  the  Pq  product  will  be  computed  many  times  in  the  inner  loop 
of  an  iterative  algorithm,  P  will  be  expensive  to  initially  compute,  but  will  cost  no  more  to 
subsequently  apply  than  P. 


3.6  Complete  Algorithm 

Combining  the  above  steps  leads  to  the  precorrected-FFT  algorithm,  which  rapidly  com¬ 
putes  the  Pq  dense  matrix-vector  product.  Using  the  above  notation,  the  algorithm  can  be 
described  as  two  steps.  The  first  step  is  to  compute 

ipo  =  VTHWq.  (3.7) 

W  and  V  are  sparse  interpolation  operators,  and  H  can  be  represented  in  a  sparse  manner  via 
the  FFT.  The  second  step  is  to  add  in  the  corrected  direct  interactions,  to  obtain  the  panel 
potentials  k )  for  each  cell 

iK*)=iM*)+  E  (3-8) 

i  e  N(k) 

Because  for  each  k ,  N(k)  is  a  small  set  and  each  matrix  P(k,l)  is  also  small,  this  second  step 
is  also  a  sparse  operation.  The  complete  algorithm  is  given  below  in  pseudocode  form. 

Precorrected-FFT  Algorithm  to  Compute  Pq 

/*  Projection  Step  */ 

Set  </  =  0 

For  each  cell  k  =  1  to  M  { 

For  each  panel  j  in  cell  k,  j  =  1  to  n(k)  { 

Add  q(k)  =  q(k)  +  W{k,j)qj(k) 

} 

} 

/*  Convolution  Step  */ 

Compute  Q  =  FFT(^) 

Compute  ^  =  HQ 
Compute  =  FFT-1^) 

/*  Interpolation  Step  */ 

Set  ip  =  0 

For  each  cell  k  =  1  to  M  { 

For  each  panel  j  in  cell  k,  j  =  1  to  n(k)  { 

Add  ipj(k)  =  ipj(k )  +  [V{k,j)]T'ipj{k ) 

} 

} 

/*  Nearby  Interactions  */ 

For  each  cell  k  =  1  to  M  { 

For  each  cell  l  in  N(k) 

m=m+p(kM  o 

} 

} 


Thus,  the  effect  of  this  algorithm  is  to  replace  the  operation 


where  P  is  a  dense  matrix,  with  the  operation 

<-  [P  +  VTHW]  q  (3.10) 

where  all  the  matrices  P,  V,  H,  W  possess  sparse  representations. 

3.7  Grid  selection 

Before  the  algorithm  has  been  completely  specified,  it  it  necessary  to  specify  how  panels  are 
selected  for  inclusion  in  direct  interaction  regions  and  how  the  grid  size  is  selected.  That  is,  for 
each  cell  k  the  set  N(k)  must  be  specified.  To  insure  that  interactions  between  panels  which 
are  close  together  are  treated  accurately,  at  a  minimum  it  is  necessary  to  compute  interactions 
between  panels  in  cells  which  are  near-neighbors  of  each  other  via  direct  products.  The  near¬ 
neighbors  of  a  cell  /?  are  defined  to  be  all  the  cells  which  have  a  vertex  in  common  with  cell  /? 
(thus  a  cell  is  a  near-neighbor  of  itself).  We  have  included  only  near- neighbor  interactions  in 
the  computations  of  this  paper. 

The  worst-case  accuracy  of  the  grid  representation  is  a  function  of  the  ratio  of  the  cell 
radius  to  the  radius  of  the  direct  interaction  region[38].  Thus,  once  the  direct- interaction 
region  has  been  specified  to  be  near-neighbor  cells,  the  selection  of  the  cell  size,  and  hence 
the  grid  spacing  is  purely  a  matter  of  computational  efficiency.  The  cost  of  direct  interactions 
will  decrease  monotonically  as  the  cells  are  made  smaller,  but  the  number  of  grid  points  will 
increase,  raising  the  cost  of  the  FFT.  This  implies  that  the  total  cost  of  the  algorithm  will  have  a 
minimum  for  some  grid  spacing.  For  a  given  grid  spacing  and  panel  configuration,  the  memory 
and  computation  time  needed  by  precorrected-FFT  algorithm  can  be  estimated  cheaply,  so 
the  optimal  grid  spacing  can  be  obtained  by  starting  with  a  small  number  of  grid  points  and 
increasing  the  number  until  a  minimum  CPU  or  memory  estimate,  as  appropriate,  is  reached. 
In  addition,  we  have  generally  required  that  the  number  of  grid  points  be  a  factor  of  two,  in 
order  to  exploit  efficient  FFT  implementations. 

It  is  interesting  that  the  optimal  grid  size  may  occasionally  be  such  that  the  number  of  grid 
charges  n  is  larger  than  the  original  number  of  panel  charges  n.  This  may  be  the  case  even  when 
the  grid  spacing  is  larger  than  the  underlying  panel  sizes,  that  is,  when  the  grid  is  “coarser” 
than  the  panel  discretization.  Such  a  case  may  occur,  for  example,  for  a  finely  discretized  cube 
surface,  where  the  grid  must  fill  the  three-dimensional  space  of  the  cube’s  interior.  However, 
the  overall  algorithm  may  still  be  quite  effective,  since  the  cost  of  the  FFT  is  O(nlogn),  with 
a  constant  factor  of  0(10).  Thus  if  n  ~  n  and  n  is  large,  the  cost  of  the  FFT  is  less  than  that 
of  the  direct  product  by  a  factor  of  nearly  O(n),  and  so  the  algorithm  may  have  n  >  n  by  a 
fairly  significant  factor  and  still  possess  an  advantage  over  the  direct  computation. 
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Complexity  Analysis 


In  this  chapter,  we  analyze  the  complexity  of  the  precorrected-FFT  algorithm  and  give 
some  comparisons  with  other  approaches.  It  is  shown  that  for  homogeneous  problems,  the 
method  is  order  nlogn  nearly  independent  of  the  kernel.  For  an  inhomogeneity  generated  by 
a  very  finely  discretized  surface,  the  combined  method  slows  to  order  n6/5  for  l/r  kernels  (or 
any  discretizations  which  are  geometrically  restricted)  and  order  n4/3  for  Helmholtz  kernels 
(discretizations  which  are  restricted  by  wavelength). 

First  we  compare  the  efficiency  of  the  grid  representation  used  in  the  precorrected-FFT 
algorithm  to  the  multipole  expansions  used  in  the  fast  multipole  method. 

Both  the  fast  multipole  algorithm  and  the  precorrected-FFT  algorithm  obtain  efficiency  by 
representing  the  long-range  part  of  the  potential  of  a  group  of  charges  by  an  expression  which 
can  be  used  at  multiple  evaluation  points,  but  the  algorithms  differ  in  the  way  they  cluster  sets 
of  charges  together  to  form  single  expressions. 

Again  consider  subdividing  the  parallelepiped  containing  the  entire  three-dimensional  prob¬ 
lem  domain  into  a  kxlxm  array  of  cells.  Then,  the  collocation  approach  above  can  be  used  to 
generate  point  charge  approximations  for  charge  distributions  in  every  cell,  effectively  project¬ 
ing  the  charge  density  onto  a  three-dimensional  grid.  For  example,  if  the  representative  point 
charges  are  placed  at  the  cell  vertices,  then  the  panel  charge  distribution  will  be  projected  to 
a  (k  +  1)  x  (l  +  1)  x  (m  +  1)  uniform  grid.  Fast  multipole  algorithms  also  effectively  create  a 
uniform  grid  by  constructing  multipole  expansions  at  the  center  of  each  cell,  but  due  to  sharing, 
the  point  charge  approach  can  be  more  efficient.  For  example,  when  representing  the  potential 
of  a  panel  by  charges  at  the  cell  vertices,  there  are  eight  free  coefficients  which  may  be  varied 
to  obtain  an  optimal  representation,  and  there  will  be  ( k  +  l)(l  +  1  )(m  +  1)  terms  in  the  entire 
domain.  On  average,  there  is  only  one  grid-charge  per  cubic  cell,  since  a  point  charge  at  a  cell 
vertex  is  used  to  represent  charge  in  the  eight  cells  which  share  that  vertex.  By  contrast,  as 
no  sharing  occurs  in  the  the  multipole  representation,  if  there  are  q  coefficients  in  the  multi¬ 
pole  expansion  which  represents  the  potential  of  the  charges  in  the  cell,  the  total  number  of 


terms  in  the  domain  will  be  q(k  +  1  ){l  +  l)(m  4-  1).  For  an  equivalent  number  of  total  terms 
in  the  domain,  we  expect  the  grid  representation  to  be  more  accurate.  Conversely,  for  roughly 
equivalent  accuracy,  we  may  choose  q  ~  8,  but  then  the  total  number  of  multipole  terms  will 
be  significantly  higher  than  for  the  grid  representation. 

From  the  analysis  of  the  preceding  paragraph,  we  expect  the  grid  representation  to  be  locally 
more  efficient  than  the  use  of  multipole  expansions.  However,  our  current  implementation  of 
the  precorrected-FFT  algorithm  may  be  globally  less  efficient,  as  the  grid  representation  is 
introduced  throughout  space,  even  where  no  panels  are  present.  Thus,  whereas  for  a  problem 
containing  n  panels,  the  fast  multipole  algorithm  can  perform  a  potential  evaluation  for  all 
of  the  panels  in  O(n)  operations,  regardless  of  the  panel  distribution^ 6],  no  such  guarantee 
is  available  for  the  precorrected-FFT  algorithm.  However,  it  is  possible  to  establish  a  weaker 
complexity  result  for  the  precorrected-FFT  method: 

Theorem  4-1-  For  a  homogeneous  distribution  of  n  panels,  the  precorrected-FFT  method 
requires  0(n  log  n)  operations  to  perform  a  potential  calculation. 

Proof.  Given  that  the  computational  domain  is  a  parallelepiped  containing  n  panels,  again 
assume  space  has  been  divided  into  an  array  of  k  x  /  x  m  cells,  and  that  the  panel  distribution 
is  homogeneous  on  the  scale  of  the  cell  size.  That  is,  the  number  of  panels  per  cell,  Nc, 
is  bounded  independent  of  n,  with  klm  of  order-n.  Finally,  assume  that  the  grid  in  each 
cell  is  a  p  x  p  x  p  array.  There  are  three  components  in  the  cost  of  the  precorrected-FFT 
algorithm:  the  cost  of  direct  interactions,  the  cost  of  grid  projection  and  interpolation,  and  the 
cost  of  the  FFT.  The  cost  of  the  direct  interactions  will  be  0(N%  x  klm)  =  0(klm)  =  0{n). 
The  cost  of  the  grid  projection  will  be  0(np 3)  =  0{ri).  Finally,  the  cost  of  the  FFT  will 
be  O (pz klm  log  p3  klm)  =  O(nlogn).  Summing  these  costs  results  in  the  final  complexity  of 
O(nlogn). 

□ 

Since  the  grid  spacing  is  typically  less  fine  than  the  underlying  surface  discretization,  for  a 
typical  problem  the  precorrected-FFT  algorithm  has  0(n  log  n)  complexity  for  problems  with 
considerable  inhomogeneity  in  the  fine  surface  discretization,  as  long  as  the  panel  distribution 
is  homogeneous  at  a  very  coarse  level.  As  will  be  seen  in  Section  5.3,  many  structures  arising 
in  practice  satisfy  this  “coarsely  homogeneous”  condition. 

For  the  boundary-integral  methods  considered  in  this  paper,  however,  the  panels  are  usually 
not  homogeneously  distributed. 

Definition  1.  Suppose  S  is  the  boundary  of  a  simply-connected  volume  in  R3  oriented  about 
its  principle  axes.  If  {A}  is  the  set  of  panel  diameters,  {A}  is  an  homogeneous  partition  of  S 
if  there  exists  integers  ( k,l,rfi )  such  that 


k 


l 


m 


for  some  c  independent  of  A. 

Theorem  4.2.  For  a  homogenous  partition  of  a  single  closed  surface  S,  at  fixed  k  the 
precorrected-FFT  method  requires  0(n6/5  log  n)  operations  to  perform  a  potential  calculation , 
where  n  is  the  number  of  panels. 


Proof.  Again  assume  space  has  been  divided  into  an  array  of  k  x  l  x  m  cells.  From  the 
homogeneous  partition  condition  we  can  conclude  that  there  exists  constants  ei,C2  such  that 


Ci{kl  +  km  +  Im)  <  n  <  C2(kl  +  km  +  Im) 

(4.3) 

The  cost  of  the  direct  interactions  is 

r  n(^2  1  1  Qm)2\ 

kl  '  km  '  '  Im  ) 

(4.4) 

and  the  cost  of  the  FFT  is 

Cf  =  0(klm\ogklm) 

(4.5) 

The  cost  of  the  projection  and  interpolation  steps  is  neglected  as  they  are  both  always  O(n) 
in  cost.  Suppose  for  the  moment  that  k  —  l  =  m,  k  —  l  —  m,  and  for  simplicity  neglecting  the 
logarithmic  factor,  then  n  =  0(k2),  Cd  =  0(ki/k2),CF  =  0(k 3).  Optimizing  G'd  +  Cf  for  k 
gives  k  —  0(k^5)  and  the  costs  Cd  =  0(n6/5),  Cf  =  0(n6/5).  This  motivates  us  to  choose 

k  =  0(ki^),l  =  0(l^5),m  =  (^(m4/5). 

(4.6) 

In  that  case, 

Cf  =  0  ( kirn)  ^  log  k  Im  <  c^n6^5  logn 

(4.7) 

and 

CD  =  0  ((kl)6/5  +  (km)6/5  +  (7m)6/5)  <  c4n6/5 

(4.8) 

for  constants  03,04.  Thus  the  cost  is  bounded  by  0(n6/5  logn).  □ 


In  this  analysis,  we  have  assumed  that  p  is  constant.  For  a  given  problem,  when  solving 
the  Helmholtz  discretization  as  the  frequency  increases,  generally  the  number  of  panels  must 
increase  to  retain  a  fixed  number  of  panels  per  wavelength.  However,  the  size  of  a  computational 
cell  decreases  proportional  to  1/M,  or  as  n-4/5,  slower  than  n.  Thus,  for  high  frequencies  the 
criterion  in  (2.48)  that  the  order  of  the  quadrature  rule  be  greater  than  2kA  will  be  violated. 
We  must  allow  p  to  vary  with  n  to  obtain  the  correct  complexity  analysis,  which  gives  a  different 
complexity  bound. 


Theorem  4-3.  For  a  single  closed  surface  the  precorrected- FFT  method  with  y/n  inversely 
proportional  to  the  wavelength  A1  requires  at  most  0(n4/3  logn)  operations  to  perform  a  potential 
calculation ,  where  n  is  the  number  of  panels. 


Proof  The  analysis  is  similar  to  the  proceeding  theorem.  Assume  that  l  =  k  =  m,  so 
k  =  l  =  m  and  n  =  0(m2).  Assume  the  size  L  of  the  computational  domain  is  fixed.  Generally, 
a  fixed  number  of  panels  per  wavelength  required  to  maintain  the  solution  accuracy,  i.e.  m  ~ 
Lj A.  Theorem  2.5  implies  that  it  is  necessary  to  require  that  the  order  q  of  the  collocation  rule 
must  be 


(4.9) 


so 
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(4.10) 


The  number  of  collocation  points  necessary  for  order  q  quadrature  is  0(q2).  If  we  assume  that 
it  is  possible  to  satisfy  the  collocation  equations  with  p3  ~  l2  grid  charges ,  then  we  have 


P 


(4.11) 


Repeating  the  above  complexity  analysis,  we  have 


•  Direct  cost  Cd  =  0(m4/m2) 

•  Interpolation  cost  Ci  =  0(p3m 2)  =  0(m4/m2),  same  order  as  the  direct  cost 

•  FFT  cost  Cf  —  0(m3p3  log2  mp)  —  0{mm 2  log  mm2) 

Neglecting  the  log  factor,  The  total  cost  is  thus  Ct  =  0{mm 2  +m4/m2)  which  when  optimized 
for  m  gives 

m  =  0(m2/3)  (4-12) 

The  asymptotic  cost  of  the  entire  algorithm  is  then  0(ra4/3  logn),  a  slight  increase  over  the 
0(n6/5  logn)  in  the  case  of  Poisson’s  equation,  and  competitive  with  two- level  multipole  based 
schemes  for  the  Helmholtz  equation  [18]. 

We  should  also  note  that  the  cost  of  forming  the  grid  projection  operators,  0(p9)  =  0(m2)  = 
0(n)  remains  reasonable.  0 


Three  comments  are  in  order. 

First,  The  critical  assumption  in  the  above  proof  is  that  it  is  possible  to  satisfy  the  collocation 
equations  with  p3  ~  l2.  Certainly  this  is  true  if  the  grid-potential  mapping  matrix  has  full  rank, 
which  is  the  case  for  low  to  moderate  frequency  problems.  It  has  been  experimentally  verified 
that  p3  ~  l2  condition  is  satisfiable  at  least  up  to  p  =  10,  which  is  sufficient  so  solve  problems 


1This  is  the  standard  fixed-number-of-panels  per  wavelength  criterion. 
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with  the  cell  size  A  on  the  order  of  a  wavelength,  which  is  a  very  high  frequency  problem.  For 
such  high  frequency  problems  other  algorithms  are  likely  to  be  more  suitable  in  any  case. 

Second,  For  many  engineering  problems  of  interest,  the  discretization  is  geometry-limited, 
not  frequency  limited,  in  which  case  the  complexity  estimate  for  the  precorrected-FFT  method 
applied  to  surfaces  is  0(n6/5  logn). 

Third,  if  polynomials  are  used  to  generate  the  projection/interpolation  operators,  the  com¬ 
plexity  for  a  geometrically- limited  discretization  refinement  (the  analogous  theorem  is  4.2),  the 
complexity  remains  0(n6/5  logn),  though  in  general  the  constant  factor  will  increase  due  to  the 
less  accurate  approximations.  For  wavelength-limited  discretizations,  the  complexity  becomes 
0(n3/2logn).  Volume-filling  geometries  are  still  solved  at  a  cost  of  0(n  log  n)  in  both  cases. 

It  is  instructive  to  consider  these  results  in  mind  of  other  available  grid-based  algorithms. 
In  order  to  solve  the  underlying  potential-theoretic  problem,  the  precorrected-FFT  algorithm 
introduces  a  uniform  grid  which  covers  the  problem  domain  volume.  Some  comments  are  in 
order  on  the  differences  and  similarities  between  the  precorrected-FFT  algorithm  and  other 
methods  which  introduce  volumetric  grids. 

First,  most  other  methods  which  use  a  grid  to  represent  the  solution  throughout  space, 
such  as  finite-difference  methods,  finite-element  methods,  or  integral  equation  methods  which 
directly  exploit  the  convolutional  properties  of  the  kernel  via  the  FFT[45,  46,  47],  introduce 
a  space-filling  grid  which  must  also  accurately  represent  the  complicated  problem  geometry. 
These  two  conflicting  requirements  generally  result  in  either  restricted  geometries  or  a  very 
large  number  of  unknowns  that  in  turn  limits  the  size  of  problem  that  can  be  effectively  solved. 

In  contrast,  as  shown  in  Figure  3-1,  the  grid  introduced  by  the  precorrected-FFT  algorithm 
is  geometrically  unrelated  to  the  underlying  surface  discretization  of  the  geometry.  In  general 
the  number  of  panels  in  a  surface  discretization  is  much  smaller  than  the  number  of  elements 
in  a  volume  representation,  so  we  expect  the  precorrected-FFT  algorithm  to  be  more  efficient, 
than,  for  example,  finite-difference  approaches. 

Additionally,  most  other  three-dimensional  grid-based  approaches  necessarily  have  a  com¬ 
plexity  of  A:3,  if  A;  is  the  number  of  basis  elements  along  a  side.  The  precorrected-FFT  method 
analyzed  here  uses  0(n  ~  k2)  basis  elements  in  the  underlying  surface  discretization,  and  the 
complexity  is  0(n)  — >  0(n12)  =  0(k 2)  — >  0(k2A).  At  k  =  50  basis  elements  per  dimension, 
corresponding  only  to  a  15,000  panel  problem,  k3  0  exceeds  k2A  by  more  than  a  factor  of  20. 

In  short,  because  of  the  decoupling  of  short-range  interactions  from  the  long-range  inter¬ 
actions  treated  by  the  grid,  the  precorrected-FFT  method  can  efficiently  utilize  fast  potential 
solvers  without  sacrificing  the  ability  to  represent  complicated  surface  geometries  in  a  compact 


manner. 
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5 


Algorithm  Performance  on  Laplace 

Problems 


Many  diverse  engineering  applications,  such  as  analysis  of  signal  integrity  in  integrated- 
circuit  interconnect,  characterization  of  electrical  packaging,  and  design  of  microelectromechan¬ 
ical  systems[48]  require  rapid,  accurate  electrostatic  analysis  of  complicated  three-dimensional 
structures.  Often  the  electrostatic  analysis  requires  the  solution  of  the  Laplace  equation  with 
Dirichlet  or,  less  often,  Neumann  boundary  conditions  specified  on  a  complicated  surface  that 
lies  in  three  dimensions. 

This  problem  can  be  solved  as  discussed  in  Chapter  1:  by  formulating  a  first-kind  integral 
equation  with  1  jr  kernel  which  is  then  discretized  by  a  weighted-residuals  technique.  Other 
recent  work  has  been  based  on  random- walk  methods[49],  partitioning  heuristics  combined 
with  techniques  from  matrix  extension  theory[50,  51],  finite-difference[52,  53]  or  finite-element 
methods[54,  55]. 

In  this  chapter  the  effectiveness  of  the  precorrected-FFT  approach  in  performing  the  matrix- 
vector  product  needed  in  an  iterative  solution  of  the  dense  discretized  system,  such  Equation 
1.5,  is  demonstrated  via  several  empirical  studies.  First  we  present  a  study  of  the  global 
errors  introduced  by  the  method.  Then,  as  a  prototype  application,  we  consider  the  extraction 
of  coupling  capacitances  in  three-dimensional  geometries.  We  present  extensive  experimental 
comparisons  with  the  capacitance  extraction  code  FASTCAP[7]  and  demonstrate  that,  for 
a  wide  variety  of  geometries  commonly  encountered  in  VLSI  design  ,  the  precorrected-FFT 
algorithm  is  superior  to  the  fast  multipole  algorithm  used  in  FASTCAP  in  terms  of  execution 
time  and  memory  use.  At  engineering  accuracies,  in  terms  of  a  speed-memory  product,  the 
new  algorithm  can  be  superior  to  the  fast  multipole  based  schemes  by  more  than  an  order  of 
magnitude. 
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5.1  Empirical  Error  Analysis 


In  this  section  we  examine  some  simple  examples  in  order  to  evaluate  the  errors  introduced 
by  the  grid  projection  method. 

As  described  above,  in  the  precorrected-FFT  algorithm,  the  interaction  between  panels 
in  neighboring  cells  is  computed  exactly,  but  more  distant  interactions  are  approximated  by 
projection,  convolution,  and  then  interpolation  using  the  grid.  To  demonstrate  that  the  errors 
due  to  using  the  grid  are  well-controlled,  we  present  an  empirical  error  study  based  on  an 
analytically  solvable  potential  problem  borrowed  from  [16].  If  Equation  1.2  is  solved  on  a 
sphere  with  given  potential 

V>(M)  =  -- t5-1) 

the  analytically  computable  charge  distribution  is 

=  (5-2) 

To  estimate  the  error  introduced  by  the  grid  approximations  in  the  precorrected-FFT 
method,  the  sphere  can  be  discretized,  as  in  Figure  5-1,  and  the  charges  qi  on  each  panel  i 
computed.  The  approximations  introduced  by  the  grid-charge  approximation  to  long-range  in- 
terations  will  become  evident  as  the  discretization  is  refined,  since  eventually  these  errors  will 
dominate  over  the  discretization  error.  One  relative  measure  of  the  error  is 
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(5.3) 


where  the  sum  runs  over  all  panels  i ,  xi  is  the  centroid  of  panel  i,  at  the  area  of  that  panel,  q% 
the  charge  on  the  panel,  and  Q  the  exact  total  charge  on  the  sphere.  Figure  5-1  (b)  shows  that 
for  the  low-order  piecewise-constant  collocation  scheme,  as  the  discretization  of  the  sphere  is 
refined  (see  Fig.  5-1  (a)),  the  integrated  error  e  decreases  proportional  to  1/n.  The  multipole  or 
precorrected-FFT  approximation  errors  are  evident  when  e  ceases  to  decrease  as  n  is  increased. 
For  example,  for  the  2x2x2  grid-charge  representation  is  used  in  each  cell  (p  =  2),  e  ceases  to 
decrease  below  about  0.05.  This  indicates  that  the  p  =  2  grid  charge  scheme  introduces  errors 
into  the  integrated  charge  calculation  of  about  5%.  Similarly,  we  expect  the  p  =  3  scheme  to 
be  accurate  to  almost  a  tenth  of  a  percent.  We  have  also  shown  results  for  the  l  =  2  multipole 
approximation,  which  from  this  experiment  we  expect  to  be  intermediate  in  accuracy  between 
the  grid  p  =  3  and  p  =  2  approximations.  Note  that  these  errors  are  smaller  in  a  relative  sense 
than  the  worst-case  errors  presented  in  Table  2-2. 


5.2  Reference  Examples 

In  this  section  we  examine  a  variety  of  simple  examples  to  evaluate  the  performance  of  the 
precorrected-FFT  algorithm  and  illustrate  some  of  its  shortcomings. 


Figure  5-1:  (a)  A  sphere  discretized  into  960  panels.  The  discretization  is  refined  by  subdi¬ 
viding  the  spherical  triangle  defined  by  the  panel  vertices  into  four  triangular  panels,  whose 
vertices  are  the  midpoints  of  the  edges  of  the  original  spherical  triangle,  (b)  Solid  lines  show 
integrated  charge  error  for  the  sphere  with  Dirichlet  condition  of  Eq.  17.  Solid  line  shows  errors 
for  grid  code,  (x)  p  =  2  (*)  p  =  3  (+)  p  =  4.  Dashed  line  connecting  (o)  shows  error  for  l  —  2 
multipole  scheme. 
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(b)  CPU  time 


Figure  5-2:  The  cube  example,  (a)  Discretization  of  the  cube,  (b)  CPU  time,  in  seconds, 
needed  for  the  fast  multipole  (dashed  line  connecting  ‘x’)  and  precorrected-FFT  algorithms 
(*)  to  compute  a  matrix- vector  product.  For  the  precorrected-FFT  algorithm,  different  results 
are  possible  depending  on  whether  speed  or  memory  usage  is  to  be  optimized.  The  solid  line 
connects  runs  with  grid  sizes  chosen  to  minimize  memory  use.  (c)  Memory,  in  Mb,  needed  by 
the  fast  multipole  and  precorrected-FFT  algorithms,  (d)  Product  of  (b)  and  (c).  Note  the 
speed-memory  product  is  fairly  independent  of  grid  size. 
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The  fast  multipole  algorithms  used  in  the  FASTCAP  program  compute  matrix- vector  prod¬ 
ucts  in  0(n )  operations  regardless  of  the  distribution  of  panels  on  the  discretized  surfaces  [16], 
but  this  is  not  true  of  the  precorrected-FFT  method.  As  described,  the  use  of  the  FFT  implies 
that  the  algorithm  computes  matrix- vector  products  in  at  best  0(n  log  n)  operations,  and  at¬ 
tains  this  optimum  only  for  fairly  homogenous  distributions  of  panels  (see  Chapter  4).  That  is, 
for  problems  where  the  panels  are  distributed  in  a  roughly  uniform  manner  throughout  space, 
the  precorrected-FFT  method  should  be  efficient.  In  contrast,  for  inhomogeneous  problems 
which  consist  of  clusters  of  panels  separated  by  large  areas  of  open  space,  inefficiency  may 
be  expected.  Therefore  it  is  important  to  quantify  the  performance  penalty  induced  in  the 
precorrected-FFT  method  by  problem  inhomogeneity. 

A  simple  approach  to  generating  an  example  which  is  inhomogenous  is  to  refine  the  dis¬ 
cretization  of  a  cube.  The  cube  example  is  intended  to  serve  as  a  model  for  typical  boundary- 
element  discretizations  of  surfaces.  As  the  discretization  is  refined,  problems  with  increasing 
numbers  of  panels  will  be  generated.  The  precorrected-FFT  algorithm  must  place  grid  charges 
in  the  empty  interior  of  the  cube,  which  causes  the  CPU  time  and  memory  required  by  the 
algorithm  to  increase  faster  than  0(n  log  n).  As  n  increases,  relatively  more  panels  are  near  the 
edges  of  the  cube  relative  to  the  interior,  i.e.,  the  problem  inhomogeneity  increases.  Thus,  at 
some  large  n,  the  fast  multipole  methods  will  be  superior  to  the  precorrected-FFT  method.  We 
wish  to  determine  how  effective  the  precorrected-FFT  method  is  for  reasonable  size  problems, 
and  at  what  n  it  would  become  advantageous  to  use  the  fast-multipole  methods. 

Figure  5-2  shows  the  comparison  of  the  precorrected-FFT  method  at  p  =  3  to  the  fast- 
multipole  based  code  FASTCAP,  at  /  =  2,  for  the  cube  example.  The  discretization  of  the 
cube  is  refined  to  generate  more  panels,  and  the  performance  of  the  two  codes  compared  as  the 
problem  size  increases.  Three  figures  are  shown.  Figure  5-2  (b)  shows  the  time  required  for  each 
code  to  compute  a  matrix-vector  product,  Figure  5-2(c)  shows  the  amount  of  memory  needed 
by  each  code,  and  Figure  5-2  (d)  shows  a  figure  of  merit  which  is  the  product  of  required  memory 
and  the  time  needed  for  a  potential  calculation.  The  product  is  important  to  consider  when 
analyzing  the  precorrected-FFT  method  because,  as  is  clear  from  the  figure,  speed  can  be  traded 
for  memory  by  manipulating  the  size  of  the  region  the  grid-charge  approximation  covers.  The 
CPU  and  memory  figures  for  the  precorrected-FFT  method  are  observed  to  grow  irregularly 
with  problem  size.  This  is  because  our  specific  implementation  of  the  method  requires  the 
number  of  grid-charges  along  one  side  of  the  computational  domain  to  be  a  power  of  two.  The 
solid  line  in  the  figures  shows  results  when  the  number  of  grid  charges  along  a  side  was  selected 
to  optimize  (see  Section  3.7)  the  speed-memory  product,  which  is  observed  to  grow  smoothly. 
Two  cases  in  Figure  5-2(a)  are  evident  where  the  code  would  have  been  considerably  faster  had 
a  different  number  of  grid-charges  been  used.  However,  as  Figure  5-2(b)  shows,  the  memory 
required  would  have  been  greater  in  each  case. 

Analysis  of  the  trend  of  Figure  5-2(a)  reveals  that  the  CPU  time  needed  to  solve  the  cube 


Figure  5-3:  The  nonuniformly-discretized  cube  example.  86,400  panels  were  present  in  the 
discretization  shown,  (b)-(d):  Computational  resources  needed  for  the  fast  multipole  (x)  and 
precorrected-FFT  algorithms  (o)  to  compute  a  matrix-vector  product  for  the  nonuniformly 
discretized  cube  geometry.  (*)  shows  time  needed  by  the  precorrected-FFT  algorithm  for  the 
uniform-cube  discretization,  (b)  CPU  time,  in  seconds,  (c)  Memory,  in  Mb,  needed  by  the  fast 
multipole  and  precorrected-FFT  algorithms,  (d)  Product  of  (b)  and  (c). 
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problem  grows  as  about  0(n115),  where  n  is  the  number  of  panels,  faster  than  the  0(n)  expected 
asymptotically  for  the  fast  multipole  method.  However,  for  all  the  problems  analyzed,  the 
precorrected-FFT  method  was  superior  in  terms  of  CPU  time  and  memory  required.  We  may 
obtain  the  approximate  point  at  which  the  algorithms  cross  over  by  extrapolating  the  data 
in  Fig.  5-2  (c).  Assuming  that  the  CPU  time  and  memory  of  the  multipole  method  grow  as 
0(n),  and  that  the  CPU  time  and  memory  required  by  the  precorrected-FFT  method  grows  as 
0(n12)[38],  then  in  terms  of  the  speed-memory  product  the  precorrected-FFT  method  will  be 
superior  to  the  fast-multipole  method  until  n  is,  at  least,  several  million  panels.  We  estimate 
over  30  gigabytes  of  memory  would  be  needed  to  solve  such  a  problem. 

Another  type  of  inhomogeneity  is  commonly  encountered  in  practical  problems.  In  the  first 
cube  example,  with  the  exception  of  a  single  smaller  edge  panel,  the  discretization  was  mostly 
uniform,  i.e.  all  the  panels  were  roughly  the  same  size.  For  surfaces  with  edges  and  corners, 
however,  the  charge  density  varies  more  rapidly  near  the  edges,  and  so  it  is  often  desirable  to 
discretize  the  surface  such  that  there  are  small  panels  near  the  edges  and  larger  ones  in  the 
interior.  To  examine  the  effect  of  this  type  of  discretization  on  the  precorrected-FFT  method, 
“cosine  spacing”  was  used  to  achieve  a  non-uniform  discretization.  By  this  it  is  meant  that  a 
non-uniform  distribution  of  N  +  1  nodes  Xk  on  the  interval  [0, 1]  was  generated  from 

Xk  =  cos  -77;  k  =  0...  N  (5.4) 

and  the  panel  sizes  derived  from  the  resultant  spacings  Xk  —  x^-i-  As  can  be  seen  from  Figure 
5-3  (a),  this  generates  a  discretization  with  smaller  panels  near  the  edges  of  the  cube,  as  needed 
to  resolve  the  rapidly  varying  charge  density.  In  Figure  5-3  (b)-(d),  comparisons  are  made 
between  the  precorrected-FFT  and  fast  multipole  algorithms  applied  to  the  nonuniform  cube 
discretization,  as  well  as  to  the  resources  needed  for  the  uniform  cube.  Consider  in  particular 
plot  (d)  which  compares  the  speed-memory  products.  As  the  number  of  panels  is  increased,  the 
non-uniform  discretization  becomes  less  homogeneous  relative  to  the  uniform  one.  Comparing 
the  speed-memory  products  of  the  precorrected-FFT  algorithm  for  the  two  different  families  of 
discretizations,  we  see  that  the  precorrected-FFT  algorithm  is  indeed  less  efficient  on  the  non- 
uniformly  discretized  problems.  Both  CPU  time  and  memory  are  observed  to  grow  somewhat 
more  rapidly  with  problem  size  for  the  non-uniform  discretization  compared  to  the  uniform  one. 
However,  for  the  largest  problems  considered  the  increase  in  resources  used  is  only  about  20- 
30%.  In  fact  even  for  the  worst  cases  considered  the  precorrected-FFT  method  is  still  superior, 
in  terms  of  the  speed-memory  product,  to  the  fast  multipole  method.  This  is  primarily  due  to 
its  superior  memory  utilization. 

The  cube  examples  demonstrate  that  problems  exist  for  which  the  precorrected-FFT  algo¬ 
rithm  is  inferior  to  the  fast-multipole  methods.  This  example,  however,  is  somewhat  artificial, 
as  very  large  capacitance  extraction  problems  are  not  usually  due  to  very  fine  discretizations  of 
a  few  surfaces,  but  rather  by  fixing  a  discretization  level,  and  solving  problems  which  involve 
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Figure  5-4:  The  bus  crossing  example,  (a)  Larger  problems  are  generated  by  adding  more 
bus  lines,  (b)  CPU  time,  in  seconds,  needed  for  the  fast  multipole  (dashed  line  connecting 
‘x’)  and  precorrected-FFT  algorithms  (solid  line  connecting  '*’)  to  compute  a  matrix-vector 
product,  (c)  Memory,  in  Mb,  needed  by  the  fast  multipole  and  precorrected-FFT  algorithms, 
(d)  Product  of  (b)  and  (c). 


increasingly  more  complicated  structures.  For  a  situation  which  better  models  problems  from 
VLSI  interconnect  analysis,  consider  a  bus  crossing  example,  as  in  Figure  5-4.  In  this  example, 
a  series  of  stacked  bus  problems  are  solved.  The  faces  of  each  bus  line  are  broken  into  quadri¬ 
lateral  sections,  and  the  quadrilaterals  discretized  by  division  into  a  central  panel  and  five  edge 
panels.  In  order  to  generate  larger  and  larger  problems,  we  consider  b  levels  of  bus  wires,  each 
level  having  b  wires. 

From  Fig.  5-4  it  is  clear  that  the  computational  cost  of  the  algorithm  grows  nearly  linearly 
with  problem  size,  as  predicted  in  Chapter  4.  For  the  size  problems  considered,  the  precorrected- 
FFT  method  with  p  =  3  enjoys  an  advantage  of  more  than  a  factor  of  three  in  terms  of 
computational  cost  and  roughly  a  factor  of  four  in  memory  utilization  over  the  fast  multipole 
method  using  order-2  expansions.  With  these  parameter  values,  however,  from  Figure  5-1  we 
expect  the  precorrected-FFT  method  to  be  considerably  more  accurate. 

5.3  Realistic  Examples 

In  this  section  we  present  results  comparing  the  FASTCAP  program  to  the  precorrected- 
FFT  method  for  computing  capacitances  of  several  three-dimensional  geometries.  As  a  pre¬ 
conditioner  has  not  yet  been  implemented  in  the  precorrected-FFT  algorithm,  all  comparisons 
were  performed  without  FASTCAP’s  preconditioner.  Figure  5.3  shows  four  realistic  three- 
dimensional  structures:  a  woven  bus  structure,  a  bus  crossing  structure,  a  via  structure,  and 
part  of  an  SRAM  memory  cell.  We  have  compared  the  multipole-based  code  FASTCAP,  using 
multipole  expansion  order  l  =  2,  to  the  grid  based  methods  with  p  —  2  and  p  —  3.  To  estimate 
the  accuracy  of  the  computed  capacitances,  we  have  compared  the  results  to  the  grid-code  run 
using  p  —  6,  which  we  expect  to  introduce  errors  into  the  calculation  which  are  very  small  com¬ 
pared  to  the  p  =  2,  p  =  3  grid  codes  or  the  multipole  l  =  2  code.  As  a  check  on  this  assumption 
we  also  performed  the  calculations  using  the  fast  multipole  algorithm  and  sixth  order  multipole 
expansions.  Taking  the  p  =  6  capacitances  to  be  exact,  we  have  calculated  both  the  maximum 
relative  errors  in  the  computed  capacitance  coefficients,  as  well  as  the  maximum  over  all  rows 
of  the  capacitance  matrix  of  the  largest  error  in  the  row  as  a  fraction  of  that  row’s  diagonal 
capacitance.  Table  5-1  shows  the  computation  times,  memory  required,  and  error  estimates 
for  each  problem.  All  experiments  were  run  on  a  DEC  AXP3000/900,  with  256  megabytes  of 
physical  memory. 

The  table  indicates  that  multipole  expansions  of  order  2  are  usually  enough  to  give  relative 
accuracy  of  one  percent  or  so  in  the  calculated  capacitances.  In  terms  of  relative  errors  in  the 
computed  capacitances,  the  p  =  2  grid  code  appears  to  be  comparable  to  the  l  =  2  multipole 
code,  and  somewhat  inferior  when  the  error  is  measured  as  a  percentage  of  the  diagonal  capaci¬ 
tance.  The  p  —  3  grid  code  clearly  has  uniformly  superior  error  properties.  These  results  are  in 
accord  with  the  sphere  example  considered  previously  in  Figure  5-1.  Note  also  Table  5-3  which 


compares  the  errors  in  final  computed  capacitances  induced  by  the  grid-projection  scheme  when 
the  grid-collocation  technique  is  used  to  generate  interpolation  operators  to  the  errors  induced 
by  polynomial  interpolation.  Though  the  errors  in  these  examples  are  small  enough  to  make 
firm  statements  difficult,  it  seems  that  the  polynomials  are  less  accurate,  especially  for  the  wo¬ 
ven  bus  example  which  generates  the  largest  errors.  For  this  example,  it  is  interesting  that  both 
grid-projection  schemes  gave  more  accurate  answers  than  the  fast-multipole  based  codes.  It  is 
possible  that  this  is  because,  even  if  all  three  schemes  generated  similar  worst-case  errors,  the 
average  errors  for  the  grid-based  algorithm  will  be  lower  because  almost  all  the  panels  in  the 
system  are  well-separated  from  the  approximate  grid  charges.  In  contrast,  in  the  fast  multipole 
scheme,  because  of  the  multilevel  hierarchy,  for  any  given  approximation  of  a  panel  charge,  any 
panel  is  “close”  to  the  approximating  multipole  expansion. 

Table  5-2  shows  explicit  performance  comparisons  of  the  1  =  2  multipole  code  to  the  p  =  2 
grid  code,  which  has  comparable  accuracy,  as  well  as  to  the  more-accurate  p  =  3  grid  codes. 
At  p  =  3,  the  precorrected-FFT  method  can  be  as  much  as  four  times  faster  and  can  use 
as  little  as  one  fifth  the  memory  of  FASTCAP.  In  terms  of  the  speed-memory  product,  the 
grid-based  code  at  p  =  3  was  superior  by  a  factor  ranging  from  four  to  twenty.  At  p  =  2,  the 
performance  advantage  of  the  grid-code  was  even  more  significant.  The  CPU  advantage  of  the 
method  ranged  from  nearly  four  to  more  than  eight,  the  memory  advantage  from  four  to  six, 
and  the  product  from  twelve  to  fifty-two. 

The  two  final  entries  in  Table  5-1  are  worthy  of  note.  Using  the  p  =  2  grid  representation, 
from  which  we  expect  about  2-4%  accuracy,  it  was  possible  to  analyze  two  very  large  problems. 
The  first  is  a  fifteen-by-fifteen  wire  woven  bus  crossing,  shown  in  Fig.  5-6,  which  has  over  80,000 
panels  in  the  discretization.  The  second  is  the  cube,  discretized  into  about  125,000  panels.  The 
precorrected-FFT  method  was  able  to  perform  a  single  solution  (one  row  in  the  capacitance 
matrix)  in  only  about  three  minutes.  More  importantly,  both  problems  could  be  solved  in  the 
available  physical  memory,  which  was  not  possible  using  FASTCAP. 

We  note  that  all  the  examples  considered  here  generate  fairly  well-conditioned  linear  sys¬ 
tems,  so  no  preconditioner  was  necessary  to  secure  convergence  in  a  reasonable  time.  However, 
use  of  a  preconditioner  could  further  reduce  the  computation  times  required  for  the  linear 
system  solution,  and  enable  solution  of  less  well-conditioned  problems. 

Finally,  we  wish  to  emphasize  that,  regardless  of  whether  the  precorrected-FFT  or  fast- 
multipole  based  approaches  are  used,  the  advantage  of  the  accelerated  schemes  over  traditional 
algorithms  is  tremendous.  Table  5-4  compares  the  computation  time  and  memory  needed  by 
algorithms  based  on  LU-factorization  via  Gaussian  elimination,  iterative  solution  using  direct 
(explicit)  matrix-vector  products,  and  iterative  solution  using  accelerated  matrix-vector  prod¬ 
ucts,  as  described  in  this  paper.  The  statistics  for  the  direct  algorithms  were  estimated  by 

1The  large  relative  errors  for  the  woven  bus  problems  occur  because  some  coupling  capacitances  are  very 
small,  0.4%  of  the  self  capacitance  for  the  15x15  woven  bus  where  the  largest  relative  error  occurs. 
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FIGURE  5-5:  Several  realistic  capacitance  extraction  problems,  (a)  The  woven  bus  example 
(woven5x5).  (b)  the  comb  drive  example  (comb),  (c)  The  via  example  (via),  (d)  the  SRAM 
example(SRAM) . 
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Example  [m] 

Code 

Setup 

Solve 

CPU 

Memory 

%  err/rel 

%  err/diag 

via  [4] 

6120 

FFT[p  =  2] 
FFT[p  =  3] 

10.85 

22.18 

33.03 

15.83 

1.41 

0.81 

10.60 

61.96 

72.56 

20.92 

0.068 

0.026 

FASTCAP 

18.56 

101.32 

119.88 

55.82 

0.369 

0.152 

woven5x5[10] 

9360 

FFT[p  =  2] 
FFT  \p  =  3] 

5.8 

125.2 

131.0 

20.35 

7.06 

1.65 

32.63 

282.4 

315.03 

49.49 

1.60 

0.048 

FASTCAP 

37.37 

656.3 

693.67 

103.8 

16.9 

0.431 

cube[l] 

. 

14406 

FFT  [p  =  2] 
FFT[p  =  3] 

4.80 

8.80 

13.6 

25.03 

0.105 

0.105 

14.37 

9.93 

24.3 

36.55 

0.003 

0.003 

FASTCAP 

34.59 

28.78 

63.37 

115.81 

0.024 

0.024 

bus3x6[6] 

6480 

FFT[p  =  2] 
FFT[p  =  3] 

6.38 

18.52 

24.9 

11.92 

1.30 

0.99 

6.59 

44.70 

51.29 

15.74 

0.033 

0.015 

FASTCAP 

21.75 

184.2 

205.95 

75.49 

1.89 

0.164 

bus3x8[6] 

11520 

FFT[p  =  2] 
FFT[p  =  3] 

5.67 

52.1 

57.77 

20.22 

1.10 

0.416 

15.2 

82.86 

98.06 

32.64 

0.021 

0.021 

FASTCAP 

30.52 

328.38 

358.9 

119.9 

1.96 

0.177 

SRAM  [6] 

3944 

FFT[p  =  2] 
FFT[p  =  3] 

3.92 

11.83 

15.75 

7.71 

0.90 

0.45 

8.30 

28.02 

36.32 

16.70 

0.046 

0.023 

FASTCAP 

11.89 

81.22 

93.11 

38.92 

0.62 

0.082 

comb  [3] 

19424 

FFT[p  =  2] 
FFT[p  =  3] 

28.9 

91.5 

120.3 

50.3 

2.11 

1.74 

23.2 

315.8 

339.0 

71.1 

0.056 

0.043 

FASTCAP 

70.2 

399.3 

469.5 

211.0 

0.12 

0.11 

wovenl5[30] 

82080 

FFT  [p  =  2] 

134.2 

6152.8 

6287.0 

246.3 

63.81 

1.81 

cube[l] 

126150 

FFT  \p  =  2] 

73.6 

127.5 

201.1 

224.7 

- 

- 

Table  5-1:  Statistics  for  FASTCAP  Order-2,  Grid-2,3  codes  for  1  jr  Green  function.  Setup, 
solve,  and  CPU  times  are  in  seconds  on  DEC  AXP  3000/900,  memory  in  megabytes,  m  is 
number  of  conductors  in  problem,  each  conductor  requires  a  separate  linear  system  solution. 


Example 

Grid  Order 

Setup 

Solve 

CPU 

Memory 

Product 

via 

P  =  2 

0.58 

0.22 

0.28 

0.28 

0.078 

P  =  3 

0.57 

0.61 

0.61 

0.37 

0.23 

woven5x5 

P  =  2 

0.16 

0.19 

0.19 

0.20 

0.037 

P  =  3 

0.87 

0.43 

0.45 

0.48 

0.22 

cube 

P  =  2 

0.14 

0.31 

0.21 

0.22 

0.046 

P  =  3 

0.42 

0.34 

0.38 

0.32 

0.12 

bus3x6 

P  =  2 

0.29 

0.10 

0.12 

0.16 

0.019 

P  =  3 

0.30 

0.24 

0.25 

0.21 

0.052 

bus3x8 

P  =  2 

0.19 

0.16 

0.16 

0.17 

0.027 

P  =  3 

0.50 

0.25 

0.27 

0.27 

0.074 

SRAM 

P  =  2 

0.33 

0.15 

0.17 

0.20 

0.034 

P-  3 

0.70 

0.34 

0.39 

0.43 

0.17 

comb 

P  =  2 

0.41 

0.23 

0.26 

0.24 

0.062 

P  =  3 

0.33 

0.80 

0.72 

0.34 

0.24 

Table  5-2:  Comparison  of  FASTCAP  and  grid  codes.  Figures  are  ratios  of  required  resources. 
“Product”  is  product  of  CPU  and  memory  figure. 


Example 

Colloc  %  err  rel 

Poly  %  err  rel 

via 

0.068 

0.090 

woven5x5 

1.6 

5.0 

cube 

0.003 

.0001 

bus3x6 

0.033 

0.049 

bus3x8 

0.021 

0.066 

SRAM 

0.046 

0.2 

comb 

0.056 

0.072 

Table  5-3:  Comparison  of  errors  in  capacitances  induced  by  using  grid-collocation  operators  for 
interpolation  and  polynomials  for  interpolation.  Both  columns  use  grid-collocation  for  the  grid 
projection  step. 


Figure  5-6:  The  large  woven  bus  structure.  The  30  conductor  structure  is  formed  from  fifteen 
conductors  woven  around  fifteen  straight  conductors.  The  actual  discretization  is  finer  than 
shown  in  the  above  figure.  There  are  82,080  panels  in  the  actual  discretization. 


Example 

CPU  Usage 

Memory  Usage 

Name 

Panels  [cond] 

P/FFT 

FASTCAP 

Direct  Iterative 

LU  Decomp 

P/FFT 

Direct 

via 

6120[4] 

1.1  min 

2.0  min 

(5.6  min) 

(1.9  hrs) 

21  Mb 

(286  Mb) 

woven5x5 

9360(10] 

5.2  min 

12  min 

(42  min) 

(6.9  hrs) 

50  Mb 

(668  Mb) 

woven 15 

82080(30] 

1.7  hrs 

- 

(11.5  days) 

(194  days) 

246  Mb 

(50.2  Gb) 

cube 

126150(1] 

3.3  min 

- 

(8.4hrs) 

(2.7  yrs) 

225  Mb 

(119  Gb) 

Table  5-4:  Comparision  of  capacitance  extraction  algorithms.  Figures  in  parentheses  are  esti¬ 
mates. 


extrapolating  timings  of  computations  performed  by  MATLAB,  and  by  assuming  that  stor¬ 
age  is  in  double-precision  floating  point  words.  For  the  largest  problems,  the  speedups  can  be 
two  orders  of  magnitude  over  iterative  solution  with  direct  products,  and  nearly  six  orders  of 
magnitude  over  Gaussian-elimination,  with  memory  savings  a  factor  of  500. 


The  Laplace  Equation  in  Stratified 

Media 


6.1  Motivation 

Many  applications  requiring  the  solution  of  the  Laplace  equation  have  more  complicated 
boundaries  and/or  boundary  conditions  than  free  space.  It  is  sometimes  useful  to  formulate 
the  problem  by  calculating  a  Green  function  which  accounts  for  the  special  geometry  of  a 
system,  thereby  removing  part  of  the  problem  domain  from  consideration.  For  an  arbitrary 
Green  function  g(x\x')  the  precorrected-FFT  algorithm  is  not  applicable  since  the  grid-potential 
calculation  cannot  be  expressed  as  a  discrete  convolution.  However,  an  important  special  case 
where  the  pre-corrected  FFT  algorithm  can  still  be  applied  occurs  in  a  system  where  the  Green 
function  is  modified  by  the  presence  of  planar  structure,  for  example,  ground  planes  or  planar 
dielectric  interfaces.  To  illustrate  the  problem  and  its  solution,  first  consider  the  case  of  a  set 
of  conducting  bodies  over  a  single  ground  plane  located  at  z  =  0  and  extending  to  infinity  in 
the  x  and  y  directions. 

By  the  method  of  images,  the  potential  at  x  =  (x,  y,  z )  due  to  a  single  charge  at  x'  = 
(x',y',z')  is 

y,  z)  =  g(x  -x',y-  y',  z  -  z')  -  g{x  -  x',  y  -  y',  z  +  z')  (6.1) 

where  g  is  the  free-space  Green  function.  The  difficulty  for  the  precorrected-FFT  method  is  that 
the  second  term  depends  on  z  +  z',  a  general  occurrence  for  problems  with  planar  interfaces. 
The  matrix  mapping  grid  charges  to  grid  potentials  is  the  sum  of  a  matrix  with  block-Toeplitz 
structure,  corresponding  to  the  first  term  of  Eq.  6.1,  and  a  matrix  with  block-Hankel  structure, 
corresponding  to  the  second  term  of  Eq.  6.1.  The  Toeplitz-like  part  of  the  matrix  corresponds 
to  the  discrete  convolution  with  the  free-space  Green  function,  and  can  be  treated  directly  with 
the  FFT  as  described  above.  Because  a  Hankel  matrix  is  related  to  a  Toeplitz  matrix  via  a 
permutation  matrix[36]  which  is  simple  to  compute,  multiplication  by  a  Hankel  matrix  may  also 


be  done  in  0(N  log  N)  time  via  the  FFT.  Furthermore,  as  we  will  see,  the  permutation  matrix 
may  be  represented  in  Fourier  space  so  that  multiplication  of  a  vector  by  the  sum  of  a  Hankel 
and  Toeplitz  matrix  can  be  performed  using  a  single  forward  and  inverse  FFT  pair.  Thus,  at 
each  iteration  of  GMRES,  the  only  additional  computation  required  to  incorporate  a  modified 
Green  function  is  multiplication  in  Fourier  space  by  a  diagonal  matrix  and  a  permutation 
matrix,  which  requires  negligible  additional  computation  time. 

An  important  special  case  of  the  general  class  of  Laplace  problems  is  the  case  of  piecewise- 
constant  dielectrics.  In  such  a  system,  the  potential  ip  generated  by  a  charge  density  p(x)  in 
the  material  with  dielectric  constant  ej  satisfies,  for  x  not  on  an  interface, 


VV(*)  =  (6-2) 

ej 

where  €j  is  the  dielectric  constant  in  the  jth  dielectric.  At  a  dielectric  interface,  the  potential 
ip  satisfies  the  boundary  conditions  [33] 


dip  _  dip 
6+  dn+  6~  dn~ 

ip(+)  =  ip(~) 


(6.3) 

(6.4) 


where  dip / dn+ ,  dip / dn~  denote  the  normal  derivatives  of  the  potential  ip  taken  from  the  side 
of  the  interface  with  dielectric  constant  e+,e“  respectively,  and  ip(+)  and  ip(-)  the  potential 
on  opposite  sides  of  the  interface. 

Two  different  multipole  accelerated  algorithms  have  been  proposed  for  problems  with  dielec¬ 
tric  interfaces.  In  [56],  based  on  [9],  the  interface  is  replaced  by  a  fictitious  charge  distribution 
constructed  to  satisfy  the  interface  boundary  conditions  (6.3,  6.4).  The  introduction  of  the 
fictitious  charges  requires  the  discretization  of  the  dielectric  interface.  There  are  four  potential 
problems  with  discretizing  the  interface.  First,  the  accurate  discretization  of  the  interfaces  may 
itself  be  problematic.  It  is  possible  to  construct  adaptive  discretization  procedures [57],  but  such 
algorithms  generally  have  a  computational  cost  several  times  that  requires  for  a  single  integral 
equation  solution.  Such  algorithms  are,  moreover,  not  widely  available.  Second,  discretization 
of  the  interfaces  requires  increasing  the  number  of  panels  in  the  discretized  integral-equation 
solver.  For  problems  with  small  numbers  of  widely-spaced  conductors,  the  number  of  additional 
panels  required  to  discretize  the  interface  could  significantly  exceed  the  number  of  panels  on 
the  conductors  of  interest,  resulting  in  an  unacceptably  high  computational  cost.  Third,  as  we 
will  see  later,  the  addition  of  discretized  interfaces  can  worsen  the  conditioning  of  the  matrix 
equation  that  must  be  solved  to  obtain  the  surface  charge  densities.  This  results  in  increased 
computational  cost  and  possible  decreased  accuracy  of  the  solution.  Finally,  straightforward 
integral  equation  formulations  for  problems  with  large  ratios  of  dielectric  constant  can  introduce 
scaling  difficulties  that  make  accurate  determination  of  the  charge  on  the  original  conductors 
difficult  [58,  59].  On  the  other  hand,  when  solving  problems  involving  dielectric  interfaces  that 
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are  not  planar,  the  appropriate  Green  functions  are  not  in  general  easily  computable,  and  so 
the  equivalent-charge  formulations  are  necessary,  as  they  have  the  advantage  that  the  dielectic 
interfaces  can  have  any  shape.  In  that  case,  once  the  fictitious  charge  has  been  introduced,  the 
problem  is  reduced  to  a  free-space  Laplace  boundary-value  problem,  to  which  the  precorrected- 
FFT  method  can  be  applied  with  advantages  and  drawbacks  as  discussed  in  Chapter  5. 

In  reference  [15],  the  discretization  of  dielectric  interfaces  is  avoided  by  working  directly  with 
the  Green  function  associated  with  the  potential  of  a  charge  in  a  system  of  layered  dielectrics. 
The  Green  function  is  approximated  by  a  weighted  collection  of  image  charges.  These  fictitious 
image  charges  can  be  incorporated  in  a  modified  version  of  the  fast  multipole  algorithm  in 
order  to  calculate  the  potential  of  the  collection  of  panel  charges.  This  approach  avoids  the 
disadvantages  associated  with  discretizing  the  dielectric  interfaces.  However,  the  multipole 
domain  must  be  extended  to  encompass  the  regions  occupied  by  the  fictitious  image  charges. 
The  computational  cost,  in  time  and  memory,  is  thus  proportional  to  the  number  of  panels 
times  the  number  of  images.  The  number  of  images  may  be  quite  large  (twenty  or  thirty). 
The  authors  of  [15]  restricted  their  study  to  cases  with  two  interfaces  where  the  conductors  lie 
strictly  in  one  layer.  The  algorithm  can  in  principle  be  extended  to  accomodate  more  interfaces, 
but  such  an  algorithm  would  likely  be  very  costly  and  unwieldy. 

In  this  chapter,  we  describe  how  the  precorrected-FFT  algorithm  can  be  used  to  evaluate 
the  potential  of  a  system  of  charges  in  a  system  of  piecewise-constant  dielectrics,  with  infinite 
planar  interfaces.  Unlike  in  [15],  the  cost  of  the  potential  evaluation  is  shown  to  be  essentially 
independent  of  the  Green  function  used.  The  distribution  of  conductors  among  multiple  dielec¬ 
tric  layers  can  introduce  inefficiencies  relative  to  free-space  problems,  however.  We  restrict  our 
study  to  problems  that  have  at  most  two  planar  interfaces,  though  in  principle  the  methods 
discusses  herein  can  be  extended  to  an  arbitrary  number  of  interfaces.  For  simplicity  we  will 
limit  our  exposition  to  cases  with  conductors  lying  only  in  the  top  two  spaces;  the  extension  to 
conductors  in  all  three  spaces  is  trivial. 


6.2  Calculating  Green  functions  for  planar  geometries 

6.2.1  Green  function  derivation 

Consider  representing  the  potential  by  the  integral 

ip(x)  =  J  cr(x')g(x,x')dS  (6.5) 

S(x,  x') 


If  the  g(x,  x')  satisfies 


Vlg{x,E)  =  -- 


(6.6) 


then  '(/;  will  satisfy  (6.2).  To  simplify  notation,  for  the  time  being  consider  the  problem  of  a 
point  charge  at  the  origin, 


e  =  6_i 


Figure  6-1:  Two-interface  layered  dielectric  problem.  Interfaces  extend  to  infinity  in  x  and  y 
directions. 


It  will  be  convenient  to  introduce  the  Fourier  transform  g  of  g) 


g{S)  =  /_*  /_"  dad/3  e-^e-^yg(a,  (3,  z) 


'  (2  iryj-ooJ-oo  ^ 

In  a  material  with  dielectric  constant  6,  g  satisfies  the  equation 


&*g(a>P>z)  _  2  _  Kz) 

dz 2  P  j  e 


and  the  boundary  conditions  (6.4,  6.3).  Letting  7 2  =  a2  +  /32,  the  differential  equation  for  g 
can  be  written 


*(7’2) = JJf  ■ 

For  a  single  uniform  dielectric,  g  can  be  determined  by  solving 


72  5(7^)  =  0 


with  the  boundary  conditions 


This  has  the  solution 


<7(7, 0+)  =  0(7,0  ) 

dgirh f)  _  dg(7> z)  _  _I 

dz+  dz~  e 


5(7^)  =  -^f^Z  1 


Since  we  know  that,  for  a  infinite  uniform  dielectric, 

9%&)  =  ^ _ 11  J~- 


(6.10) 


(6.11) 


(6.12) 

(6.13) 


(6.14) 


(6.15) 


'  47re||f-f'||  v  ' 

we  can  conclude,  by  uniqueness  of  solutions  to  the  Laplace  equation,  that  the  inverse  Fourier 
transform  of  g  is 


-Tx  r  r  dadj =  -  X  (6.16) 

(2vr)2  J- 00  7-oo  276  47re||a:  -  z'|| 

Consider  the  three- layer  dielectric  structure  shown  in  Figure  6.2.1.  For  a  source  located  at 
z',0  <  z'  <  d,  (in  the  middle  layer),  the  Fourier-domain  Green  function  is 


-ja(x-x')e-j0(y-y')  1  ^/rU-z'l  _ 
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'  ’  *  27e0  +  27£0 

for  z  >  d,  and  for  0  <  z  <  d, 


J R+R-e~^z~z'  +2d)  +  R_e~'1/(z+z')  +  R+R-e-^z+z>S>  +  R+e~^z~z">\ 

(6.17) 


g(j,z,z')  =  -  l(Z  z)+fej)  \R+R_e-^2d-*+*')  +  R_e-^z+z')  +  R+R-e-^2d-z'+z)  +  R+e-^™-*-^} 
Z'yeo  27Co  l  j 


where 


R+  = 
R-  = 


and 


5(7,  d)  = 


60  ~  Cl 
Co  +  Cl 
Co  -  c-i 
Co  +  C-l 

1 


(6.18) 

(6.19) 

(6.20) 


1  -  R+R-e-W 


(6.21) 


For  a  source  at  z'  >  d,  we  have  for  z  >  d 

S(7,z,z')  =  [e-T'lz-2'l  -  e^z+z'-2d'>  +  T+R-S('y,d)e~^z+z')  +  T+S('y,d)e-^z+z'-2dA 

2^61  L  J 


5(7, z') 


where 


27c! 


(6.22) 

[e-7(*'-*)  +  R-S( 7,  d)e-7(2'+2)  +  5(7,  d)i?+i?_e-'r(2d-2+z')]  0  <  z  <  d 

(6.23) 


T+  = 


2ei 

eo  + 


(6.24) 


Note  that  this  formulation  can  account  for  the  presence  of  a  ground  plane  at  one  of  the 
interfaces  by  taking  ei  — ►  oo  or  e_i  — >  — oo,  as  appropriate,  in  which  case  -R+  =  —1  or  R-  =  —1 
respectively.  Care  must  be  taken  if  both  — ►  —  1  as  then  the  function  S(j,  d)  is  singular 

at  7  =  0.  A  symmetry  plane  can  also  be  accomodated  by  taking  e\  — ►  0  or  c_i  — >  0  as  needed. 

The  key  to  the  applicability  of  the  precorrected-FFT  method  is  that,  as  is  clear  from  in¬ 
spection  of  Equations  6.17-6.23  and  the  linearity  of  the  Fourier  transform,  each  of  the  Green 
functions  can  be  decomposed  into  two  terms, 


g( x,tf)  =  f{x  -  x',y-  y',z  -  z')  +  h(x  -  x',y  -  y',z  +  z') 


(6.25) 


6.2.2  Approximating  the  Green  functions 

If  it  were  not  for  the  factor  5(7,  d),  all  the  terms  in  the  Green  functions  above  would  be 
exponentials.  As  the  inverse  Fourier  transform  of  e-7l2l  is  the  potential  due  to  a  point-charge, 
#  if  the  function  S(j,d)  could  be  represented  as  a  sum  of  exponentials,  then  the  Green  function 

could  be  represented  as  a  sum  of  the  potentials  of  weighted  point-charges.  In  fact  the  inverse 
Fourier  transform  of  each  of  the  functions  corresponds  to  the  potential  of  a  infinite  series  of 
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images.  In  general,  for  a  problem  with  n  interfaces,  the  Green  function  can  be  expressed  in 
terms  of  an  (n  -  l)-fold  nested  infinite  series  of  image  charges.  These  series  may  converge  very 
slowly,  however,  and  even  if  they  converge  rapidly,  if  the  number  of  interfaces  is  large,  summing 
the  contribution  from  all  the  significant  images  may  be  expensive.  Therefore  it  is  worthwhile 
to  look  for  approximations  to  the  Green  functions  by  a  finite  number  of  images. 

S( j,d)  can  be  written  in  terms  of  the  function 


by 


F(u)  = 


Ke~u 


1  -  Ke~u 


(6.26) 


S(j,d)  =  l  +  F(yd) 


(6.27) 


where  K  =  R+R-.  For  the  two-interface  problem,  finding  an  image  approximation  to  the  Green 
functions  is  then  equivalent  to  approximating  F(u)  by 

N 

F{u)^^2aiebiU  (6.28) 

i— 1 

and  generally  we  will  want  to  require  that  the  hi  be  real  and  negative[60].  It  is  of  course  also 
desirable  that  N  be  small.  We  solve  the  approximation  problem  by  collocating  at  judiciously 
chosen  points  uj,j  =  1 . . .  2 N.  That  is,  we  require 

F(Uj)  =  (6.29) 

i—  1 

for  each  j.  The  resulting  system  of  nonlinear  equations  is  solved  via  a  continuation  technique[61], 
with  K  as  the  continuation  variable.  Many  other  methods  to  fit  a  sum  of  exponentials  to  a 
function  have  been  proposed,  such  as  the  classical  Prony’s  method[62]. 


6.3  Algorithmic  Implications 

6.3.1  The  interpolation/projection  operators 

We  note  that  the  Green  function  for  the  layered  media  case  may  be  obtained  by  the  method 
of  images,  and  thereby  decomposed  into  several  terms,  each  interpretable  as  potential  due  to 
a  charge  in  free-space.  The  precise  analytic  form  of  the  decomposition,  such  as  into  multiply- 
infinite  series,  may  be  cumbersome,  but  what  is  important  is  to  note  that  such  a  decomposition 
exists.  Since  the  potential  is  due  to  a  set  of  image  charges  each  having  the  free-space  Green 
function,  Theorem  2.1  implies  that  the  projection  of  charge  onto  the  grid  and  interpolation  of 
grid  potentials  can  be  done  by  using  the  same  operators  as  for  charges  in  free  space.  Further,  we 
note  that  the  distance  to  any  image  charge  will  be  greater  than  the  distance  to  the  actual  charge. 
Thus  the  error  bounds  derived  in  Chapter  2  for  representation  of  Laplace  potentials  in  free  space 
hold  for  the  grid-collocation  scheme  in  layered  media  as  well.  Therefore,  to  incorporate  layered 


media  into  a  capacitance  extraction  code,  the  only  change  in  the  precorrected-FFT  method  is 
in  computation  of  grid  potentials  due  to  grid  charges. 

A  more  insightful  viewpoint  can  be  obtained  by  noting  that  one  interpretation  of  the  grid- 
collocation  approach  of  Chapter  2  is  that  it  is  a  means  to  obtain  interpolation  operators  which 
are  highly  accurate  for  harmonic  functions.  The  precise  form  of  source  of  the  harmonic  potential 
is  not  particularly  important.  Since  the  Green  functions  satisfy  the  Laplace  equation  except 
at  sources  and  layer  boundaries,  neither  of  which  are  treated  by  the  grid,  any  operator  which 
accurately  interpolates  such  functions  can  be  used  for  projection  and  interpolation  in  the  grid- 
base  scheme. 

6.3.2  Modifications  to  the  FFT-based  convolution 

Two  modifications  to  the  convolution  step  in  precorrected-FFT  approach  are  needed  when 
moving  from  free-space  problems  to  problems  with  planar  interfaces.  First,  the  code  must  be 
modified  to  account  for  the  presence  of  Hankel-like  structure.  To  see  how  this  can  be  done, 
consider  a  one  dimensional  example.  Define  the  matrices  T,H  €  Rrixn  by 


Tij  =  i  —  j 


Hij  =  i+j-  5 


(6.30) 


The  n  =  4  matrices  are 


'  0 

-1 

-2 

-3  ' 

’  -3 

-2 

-1 

0  ' 

1 

0 

-1 

-2 

H  = 

-2 

-1 

0 

1 

2 

1 

0 

-1 

-1 

0 

1 

2 

3 

2 

1 

0 

0 

1 

2 

3 

(6.31) 


For  a  vector  x  6  Rn,  to  compute  Tx  we  first  embed  T  into  a  circulant  matrix  Ct  G  R 


2nx2n 


CT  = 


0 

-1 

-2 

-3 

Z 

3 

2 

1 

1 

0 

-1 

-2 

-3 

z 

3 

2 

2 

1 

0 

-1 

-2 

-3 

z 

3 

3 

2 

1 

0 

-1 

-2 

-3 

z 

z 

3 

2 

1 

0 

-1 

-2 

-3 

-3 

z 

3 

2 

1 
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-1 
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-2 

-3 
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3 

2 

1 

0 

-1 

_  -1 

-2 

-3 
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3 

2 

1 

0 

(6.32) 


where  2  denotes  an  arbitrary  entry.  As  Ct  is  circulant,  it  has  the  eigendecomposition  Ct  = 
FhAF,  where  F  is  the  discrete  Fourier  transform  matrix  normalized  by  ^/n,  i.e.  FH F  —  I. 
The  diagonal  entries  of  A  are  Fc,  where  c  is  the  first  column  of  Ct-  To  compute  Tx  using  Ct, 
x  is  first  embedded  into  a  vector  xp  of  length  2n, 


Xp  = 


x 

0 


(6.33) 
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a  step  sometimes  referred  to  as  “zero-padding.”  Now  we  note  that 


Tx 

y 


0 

-1 

-2 

-3 

z 
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1 
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0 
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-3 
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-3 
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-3 

-3 

z 

3 

2 

1 

0 

-1 
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-3 

z 

3 
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1 

0 

-1 

_  -1 

-2 

-3 

z 

3 

2 

1 

0 

x 

0 


=  CTxP  =  FHAFxP  (6.34) 


where  y  €  R"  is  a  vector  to  be  discarded.  To  perform  this  computation,  Fxp  and  Fc  are 
computed  using  the  FFT.  The  multiplication  by  FH  is  simply  the  inverse-FFT. 

Let  x'  denote  x  with  the  entries  written  backwards,  i.e. 

x'k  =  Zn-Jt+l  (6-35) 

Likewise,  let  H'  denote  H  with  the  columns  similarly  reordered  such  that  H'x'  =  Hx  for  any 
x.  We  have 


H'  = 


0 

1 

2 

3 


-1 

0 

1 

2 


-2  -3 
-1  -2 
0  -1 
1  0 


=  T 


(6.36) 


Thus  H  is  related  to  T  by  a  permutation  matrix  P, 

0  0  0  1 


P  = 


(6.37) 


0  0  10 
0  10  0 
10  0  0 

and  in  fact  any  Hankel  matrix  is  related  to  a  Toeplitz  matrix  by  a  permutation  matrix  P  which 
has  ones  along  the  principal  anti-diagonal.  Note  that  x'  =  Px.  Thus,  Hx  =  H'x1  =  HPPx  = 
TPx.  So  we  see  that  multiplication  by  either  a  Toeplitz  or  a  Hankel  matrix  can  be  performed 
by  the  FFT.  What  is  less  obvious  is  that  multiplication  by  matrices  of  the  form  T  +  H,  with  T 
and  H  arbitrary  Toeplitz  and  Hankel  matrices  respectively,  can  be  performed  via  a  single  FFT. 
This  is  possible  because  it  happens  that  the  FFT  of  x'  can  be  easily  computed  from  the  FFT 
of  x. 

Take  x  €  R^  and  let  x'  e  KN  be  the  the  reversed  vector,  i.e. 

x'n  =  XN-n- 1  (6.38) 

Let  X,  X'  be  the  discrete  Fourier  transforms  of  x,  x'  respectively: 


AT-l 


Xk  =  Fx=  Yj  xne 


— 27r  ikn/N 


(6.39) 


n= 0 
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X'k  =  Fx'  =  Y  x'ne-^ikn/N  =  Y  XN-n-ie 
n— 0  n=0 

Letting  m  —  N  —  n  —  1, 

X'k  =  Y  xme-2^N-m-^N 


-27 rikn/N 


or 


(FPx)k  =  X'k  =  e^ik'NX*k 


(6.40) 


(6.41) 


(6.42) 


where  Xk  denotes  complex  conjugate  of  Xk.  An  additional  factor  is  needed  to  take  care  of  the 
zero-padding.  If 


xP  = 


x 

0 


(6.43) 


then  the  desired  transform  is 


(Fx'P)k  =  (-1  )ke2*ik'N(FxP)t 


(6.44) 


Thus  we  see  that  the  cost  of  computing  the  necessary  auxiliary  transform  is  quite  small  relative 
to  the  FFT  cost,  so  the  cost  of  multiplication  by  the  sum  of  a  Hankel  and  a  Toeplitz  matrix  is 
only  marginally  greater  than  the  cost  of  multiplication  by  a  Toeplitz  matrix. 

The  second  issue  arising  in  the  layered  media  problem  is  that  the  Green  function  is  now 
only  piecewis e-C°°  away  from  the  source  point,  whereas  in  free-space  it  was  C°°  everywhere 
away  from  the  source  point.  Thus,  if  there  are  k  layers  which  contain  conductors,  there  are  k2 
separate  pieces  to  the  Green  function  which  need  to  be  considered.  If  there  are  conductors  in 
only  one  layer  the  situation  is  substantially  the  same  as  for  free-space.  Otherwise,  k  separate 
grids  must  be  introduced,  each  lying  strictly  in  one  layer.  Each  grid  must  lie  strictly  in  a  single 
layer  as,  due  to  the  discontinuity  in  the  derivative  of  the  potential  at  a  dielectric  interface, 
interpolation  across  interfaces  will  not  be  accurate.  There  are  then  k 2  separate  Toeplitz  and 
Hankel  matrices  which  correspond  to  the  pairwise  grid-potential  contributions  from  each  layer 
to  each  other  layer. 

Let  hmn ,  thm  denote  the  Hankel  and  Toeplitz  parts,  respectively,  of  the  Green  function  for 
a  source  in  layer  n  and  observation  point  in  layer  m.  The  computation  of  the  grid  potential 
for  layer  m  becomes 

=  Y  E  (*«»»(*  -  i',j  ~j',k  -  k')  +  hmn(i  -i',j  -j',k  +  k'))  qn(i',j',k')  (6.45) 
n 

Letting  P  denote  the  sparse  operator  which  is  the  three-dimensional  analogue  of  the  Fourier- 
space  reversal  operation  of  Eq.  6.44,  the  convolution  step  of  the  algorithm  becomes 


/*  Modified  Convolution  Step  */ 
for  each  layer  m 

Compute  Qm  =  FFT(<?m) 
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dz2 


dz 2  ~h  d2i 


Figure  6-2:  The  grid  spacing  difficulty.  If  the  grids  in  two  layers  have  unequal  spacing, 
d32  ~f~  ^21  and  the  grid-grid  mapping  matrices  Hmn  will  not  have  Toeplitz  structure. 

Compute  n  =  Y,m  H  mn  (/  +  P)Qn 
for  each  layer  m 

Compute  ipm  =  FFT-1(’®,m) 


6.3.3  Constraints  on  grid  size  and  spacing. 

In  the  free-space  case  the  number  grid  points  in  each  direction  (nx,ny,nz)  and  the  spacing 
A  of  the  grid  charge  points  is  purely  a  matter  of  computational  efficiency  (at  least,  for  the 
Laplace  type  kernels  this  chapter  is  concerned  with).  This  is  essentially  true  in  the  layered 
media  problem,  as  long  as  all  the  conducting  surfaces  lie  in  the  same  dielectric  layer.  Of 
course  care  must  be  taken  to  insure  that  interpolation  across  an  interface  is  not  attempted. 
When  the  conductors  lie  in  different  layers,  the  situation  becomes  more  complicated.  Figure 
6.3.2  illustrates  the  principle  difficulty.  In  order  for  the  grid-potential  matrices  to  have  the 
appropriate  Toeplitz  structure,  the  grid  spacing  must  be  the  same  in  each  layer.  In  addition, 
in  order  to  be  able  to  exploit  the  FFT,  the  number  of  grid  points  ( nx ,  ny,  nz)  must  be  the  same 
for  the  grids  in  each  layer.  This  can  result  in  computational  inefficiency. 

For  each  layer  m  let  dm  denote  the  maximum  difference  in  ^-coordinate  between  any  two 
points  on  conductors  lying  in  layer  m.  For  example,  in  Fig.  6.2.1,  do  <  d.  We  must  have 

nz A  >  max  dm  (6.46) 

Til 

for  the  grid  to  cover  the  conductors.  If  di  <  maxm  dm  for  some  layer  i,  there  will  be  many 
points  in  the  grid  of  layer  i  which  do  not  represent  any  charge  density.  Thus,  we  expect  that 
the  precorrected-FFT  algorithm  will  be  less  efficient  in  layered  media,  for  any  given  geometry, 
than  its  free-space  counterpart. 
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Example[m] 

Panels 

Code 

Setup 

Solve 

CPU 

Memory 

via  [4] 

6120 

FFT[LM] 

FFT[FS] 

19.6 

82.3 

10.60 

61.96 

1 

FASTCAP  [FS] 

18.56 

101.32 

woven5x5[10] 

9360 

FFT[LM] 

FFT[FS] 

61.1 

MEMM 

mmfim 

49.49 

FASTCAP  [FS] 

iia 

Ki££&2fl 

bus3x8[6] 

11520 

FFT[LM] 

FFT[FS] 

15.2 

82.86 

98.06 

32.64 

FASTCAP  [FS] 

30.52 

328.38 

358.9 

119.9 

SRAM[6] 

3944 

FFT[LM] 

FFT[FS] 

23.6 

38.7 

62.4 

21.6 

|£jl!B 

16.70 

FASTCAP  [FS] 

BUB 

38.92 

comb[3] 

19424 

108.3 

23.2 

71.1 

FASTCAP  [FS] 

70.2 

Table  6-1:  Statistics  for  FFTCAP-layered  media  code,  FFTCAP  free-space  code,  and  FAST- 
CAP.  Setup,  solve,  and  CPU  times  are  in  seconds  on  DEC  AXP  3000/900,  memory  in 
megabytes,  m  is  number  of  conductors  in  problem,  each  conductor  requires  a  separate  lin¬ 
ear  system  solution.  The  LM  code  used  the  ground-plane  Green  function,  the  FS  codes  use 
the  free-space  Green  function,  and  have  no  equivalent-charge  discretized-interfaces  for  these 
problems. 


6.4  Examples 

First  we  demonstrate  that  the  modified  precorrected-FFT  algorithm  is  an  efficient  approach 
for  realistic  engineering  geometries.  A  single  interface  (a  ground  plane)  was  added  to  some  of 
the  example  problems  from  Section  5.3.  Table  6-1  shows  the  computational  resources  required 
by  the  free-space  precorrected-FFT  method  [FS],  the  layered- media  precorrected-FFT  code 
[LM],  and  the  fast-multipole  based  FASTCAP.  From  Table  6-1  it  is  clear  that  utilization  of  the 
layered  media  Green  function  does  not  result  in  a  significant  performance  penalty,  at  least  for 
the  case  when  all  the  panels  are  in  the  same  dielectric  layer.  The  increases  in  resource  usage  are 
primarily  due  to  inefficiencies  introduced  by  the  additional  code  needed  to  manipulate  Hankel 
matrices  and  handle  the  various  different  possible  dielectric  configurations. 

Next  we  compare  the  precorrected-FFT  method  with  layered-media  Green  functions  to  the 
competing  approach  of  discretizing  the  dielectric  interface.  For  this  study,  we  take  as  a  canonical 
problem  the  calculation  of  the  capacitances  of  two  spheres.  The  spheres  have  radius  lm.  Their 
centers  are  located  1.05m  above  the  plane,  and  3m  from  each  other.  The  discretized  interface 
version  of  this  problem  is  shown  in  Figure  6-3. 

To  generate  appropriate  discretizations,  we  first  consider  the  case  of  two  spheres  over  a 
ground  plane.  First  an  adequate  discretization  of  the  sphere  is  generated  by  uniformly  refining 
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Example 

Code 

Setup 

Solve 

Memory 

via 

FFT 

1.85 

1.33 

1.16 

FMM 

IHiMI 

■n:« 

0.43 

woven5x5 

1.23 

warn 

0.59 

bus3x8 

FFT 

2.8 

1.45 

1.36 

FMM 

1.4 

0.37 

EE9H 

SRAM 

FFT 

2.84 

FMM 

EE9I 

comb 

FFT 

0.81 

1.52 

FMM 

EH 

0.64 

0.51 

Table  6-2:  Comparison  of  layered- media  code  to  free-space  FFTCAP  and  FASTCAP  codes 
for  the  problems  of  Table  6-1.  Figures  are  ratios  of  resources  required  by  the  layered  media 
code  to  those  needed  by  the  free-space  precorrected-FFT  code  (FFT  in  table)  or  the  free-space 
FASTCAP  (FMM)  code. 


the  discretization  until  the  self  and  mutual  coupling  capacitances  of  the  spheres  converges  to 
within  a  tolerance  of  1%  of  the  diagonal  value.  Next  the  ground  plane  is  added,  and  the 
discretization  (again  uniform)  refined  until  1%  relative  convergence  is  achieved.  Finally,  using 
the  same  spacing  in  the  ground  plane,  the  ground  plane  width  is  increased  to  convergence.  The 
final  discretization  had  1600  panels  in  the  ground  plane  and  about  880  on  each  sphere. 

Table  6-3  shows  the  relative  computational  resources  required  by  the  various  approaches. 
While,  for  a  given  structure,  the  layered-media  code  requires  more  computational  effort  than 
the  free-space  code,  the  increased  requirements  are  more  than  compensated  by  the  extra  work 
required  when  interfaces  must  be  discretized.  Due  to  the  larger  number  of  panels  present 
when  interfaces  must  be  discretized,  the  time  required  for  a  matrix-vector  product  increases 
considerably.  In  addition,  the  matrix  appears  to  be  less  well-conditioned  as  indicated  by  the 
increased  number  of  iterations  required  for  the  GMRES  solver  to  converge.  Note  that  the  time 
required  by  the  fast  multipole  algorithm  to  compute  a  matrix-vector  product  with  a  single 
dielectric  interface  is  almost  six  times  more  than  the  precorrected-FFT/layered  media  code 
requires.  Even  for  the  case  of  a  symmetry  plane,  which  is  handled  in  the  free-space  codes  by 
solving  the  capacitance  problem  for  four  spheres,  the  layered-media  code  gives  a  considerable 
savings  in  time  and  required  memory. 

Table  6-4  shows  the  computed  capacitances  of  two  spheres  lying  in  a  dielectric  half-space. 
e+  is  the  dielectric  constant  in  the  upper  half-space  where  the  conducting  hemispheres  are 
located.  e_  is  the  dielectric  constant  in  the  other  half  space.  Even  for  very  large  values  of  the 
dielectic  constant,  the  computed  capacitances  seem  to  be  well-behaved.  Table  6-4  also  shows 
the  number  of  matrix-vector  vector  products  needed  for  the  two  linear  system  solutions.  As 
before,  the  free-space  code  with  discretized  interfaces  needs  more  iterations  to  converge.  Note 
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FIGURE  6-3:  The  two  sphere  problem  with  discretized  interface.  Actual  discretization  is  shown. 


Example 

Code 

Setup 

Solve 

Memory 

Products 

MVP  Time 

groundplane 

FFT-LM 

giaw 

6.1 

24 

WE2MBHM 

FFT-DI 

■’If— 

76 

mamm 

FMM-DI 

52.1 

izm 

76 

0.69 

symmetry  plane 

FFT-LM 

4.2 

4.9 

6.1 

24 

FFT 

4.4 

18.6 

11.4 

60 

0.31 

FMM 

17.6 

32.4 

26.4 

62 

0.52 

dielectric  interface 
e+  =  1,  e_  =  3 

FFT-LM 

6.1 

22 

FMM-DI 

47.3 

66 

0.76 

dielectric  interface 
e+  =  3,  e_  =  1 

FFT-LM 

4.2 

4.8 

6.1 

24 

0.4 

FMM-DI 

15.1 

48.5 

47.2 

64 

0.76 

Table  6-3:  Two  sphere  example.  Table  gives  computational  resources  required  by  the  layered 
media  precorrected-FFT  code  [FFT-LM]  and  the  free-space  precorrected-FFT  [FFT-FS]  and 
fast-multipole  [FMM]  codes.  The  free  space  codes  use  discretized  interfaces  to  account  for  the 
planar  portion  of  the  geometry,  except  for  the  symmetry  plane  case,  which  is  implemented  by 
duplication  of  the  conductor  discretiztion. 


e+ 

e_ 

Cs 

mm 

mm 

£2522 

1 

2 

146.7 

■EMBM 

Bjtgjig 

IB 

1 

4 

172.8 

172.4 

-40.47 

-40.35 

22 

70 

1 

8 

201.2 

200.5 

72 

1 

16 

226.3 

225.2 

22 

73 

1 

32 

244.6 

22 

74 

1 

64 

mm 

-23.56 

-22.39 

71 

1 

1000 

mm 

-19.67 

-18.23 

Emm 

93 

1 

269.5 

268.1 

-19.40 

X 

2 

1 

227.3 

227.5 

-85.00 

-84.95 

i 

4 

1 

423.5 

424.2 

24 

62 

8 

1 

814.4 

64 

16 

1 

66 

32 

1 

3139 

3148 

-1326 

24 

66 

64 

1 

6241 

6259 

-2648 

24 

68 

1000 

1 

9695 

9724 

-4132 

24 

82 

10000 

1 

96950 

97220 

-41310 

-41270 

24 

291 

Table  6-4:  Mutual  (Cm)  and  self  ( Cs )  capacitance,  in  pF,  of  spheres  in  dielectric  half-spaces, 
with  varying  dielectric  constants.  Spheres  are  of  radius  1  with  centers  located  1.05  above  inter¬ 
face.  X  indicates  the  non-preconditioned  GMRES  solver  did  not  converge  in  available  memory. 
Superscript  “DI”  indicates  capacitances  calculated  using  the  equivalent  charge  formulation  with 
discretized  interfaces. 

that  for  e+  >  e_,  the  discretized  interface  approach  generates  matrices  which  appear  to  become 
ill-conditioned.  In  fact  for  the  largest  dielectric  ratio  considered,  the  iterative  solver  did  not 
converge  without  use  of  a  preconditioner.  The  deviation  in  the  calculated  capacitances  is  also 
observed  to  increase  as  the  ratio  of  dielectric  constants  becomes  very  high.  At  the  largest  ratio 
considered,  the  relative  deviation  in  the  mutual  capacitance  is  nearly  8%.  From  the  data  in  this 
table,  there  appears  to  be  no  concern  for  practical  problems,  but  there  could  be  an  indication 
of  an  underlying  numerical  difficulty. 

A  more  pathological  example  was  examined  to  determine  if  use  of  the  layered  media  Green 
function  can  give  more  accurate  results  than  discretizing  the  interface.  Instead  of  two  spheres 
over  a  ground  plane,  consider  the  case  of  two  hemispheres,  as  shown  in  Figure  6-4.  The  spheres 
each  rest  on  the  interface  of  a  half-space.  As  these  are  open  surfaces  with  edges,  it  is  in  fact 
not  clear  whether  the  capacitances  are  bounded  as  the  discretization  is  refined,  and  indeed  to 
obtain  indications  of  convergence,  much  finer  discretizations  of  the  hemispheres  were  needed 
than  for  the  spheres.  About  10000  panels  were  used  for  the  ground  plane  discretization,  and 
each  hemisphere  was  discretized  into  2600  panels,  which  corresponds  to  panels  roughly  six  times 
smaller  than  for  the  sphere  example. 

Table  6-5  shows  the  computed  capacitances  for  the  hemispheres  in  dielectric  half-spaces, 
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FIGURE  6-4:  The  two  hemisphere  problem  with  discretized  interface.  Actual  discretization  is 
finer  than  shown. 

analogous  to  Table  6-4.  Table  6-6  gives  the  computational  performance  figures  for  the  various 
test  cases.  When  e+  <  e_,  both  codes  give  very  similar  results,  which  leads  us  to  believe  that 
both  give  accurate  answers.  However,  for  the  reverse  case  e+  >  e_,  the  free-space  code  with 
discretized  interfaces  gives  capacitances  which  increasingly  deviate  from  those  calculated  by  the 
layered-media  code.  Note  that  as  e_  — *■  oo  the  capacitances  calculated  by  the  layered-media 
code  appear  to  be  converging  to  those  for  the  case  of  two  hemispheres  over  an  infinite  ground 
plane  of  Cs  =  293.3 pf,  Cm  —  -5.83 pf.  Cm  calculated  by  discretizing  the  dielectric  interfaces 
is  clearly  in  error.  Note  also  that  nonconvergence  is  reached  for  a  lower  value  of  dielectric 
ratio  than  for  the  sphere  example.  Experiments  were  performed  with  the  sphere  example,  with 
the  spheres  very  close  to  the  ground  plane,  and  no  significant  qualitative  differences  from  the 
results  above  were  obtained.  This  leads  us  to  believe  that  the  singular  nature  of  the  charge 
distribution  at  the  hemisphere  edges  is  interacting  with  a  difficulty  in  the  integral  formulation 
for  the  dielectric  interfaces  in  such  a  way  as  to  make  the  calculation  inaccurate.  The  precise 
nature  of  the  difficulty  is  unclear,  but  it  may  have  to  do  with  the  generation  of  a  non-trivial 
nullspace  in  the  potential  coefficient  matrix  as  e_  — >  oo  [58,  59]. 

Finally  we  consider  some  more  realistic  geometries  and  some  multi-layer  structures.  The  first 
is  the  air  bridge  structure[15]  shown  in  Figure  6-5.  In  this  structure  a  groundplane  is  present  at 
z  =  Ocm  and  a  dielectric  interface  present  at  z  =  0.5cm.  To  generate  an  example  with  panels  in 
multiple  dielectric  layers,  we  have  augmented  this  problem  by  adding  a  second  trace,  parallel 
to  the  bottom  trace,  at  z  =  0.5833cm,  which  lies  in  the  uppermost  dielectric  layer.  We  also  in 
this  section  quote  results  for  the  comb  structure,  for  which  a  realistically  meshed  ground  plane 
is  available  and  included  in  the  FMM  (FASTCAP)  computations.  A  two  hemisphere  example, 
with  the  hemispheres  lying  on  a  groundplane,  in  a  dielectric  layer  of  dielectric  constant  5,  with 


e+ 

€— 

-CfT- 

Ml— 

mm 

MVP 

MVPm 

1 

2 

122.7 

122.6 

MBtMMB 

48 

113 

1 

4 

156.6 

156.5 

-30.76 

-30.35 

50 

128 

1 

8 

195.1 

195.1 

-28.24 

-27.18 

53 

138 

1 

16 

230.2 

230.9 

-22.78 

-20.7 

56 

146 

1 

32 

256.4 

258.1 

-16.82 

-13.6 

56 

153  j 

1 

273.1 

275.6 

— 

-8.07 

56 

156 

1 

291.9 

295.5 

56 

X 

2 

i 

164.8 

164.9 

-50.35 

-50.39 

44 

100 

4 

i 

294.0 

294.3 

-94.29 

-94.41 

40 

101 

8 

i 

549.6 

550.3 

-181.0 

-181.3 

38 

16 

i 

1059 

1061 

-353.8 

-354.3 

36 

96 

32 

i 

2078 

2081 

-699.0 

-700 

36 

100 

64 

i 

4115 

4121 

-1389 

-1391 

35 

100 

Table  6-5:  Mutual  (Cm)  and  self  (Cs)  capacitance,  in  pF,  of  hemispheres  in  dielectric  half¬ 
spaces,  with  varying  dielectric  constants.  Superscript  “DI”  indicates  capacitances  calculated 
using  the  equivalent  charge  formulation  with  discretized  interfaces.  X  indicates  the  non- 
preconditioned  GMRES  solver  did  not  converge  in  available  memory.  • 


Example 

Code 

Setup 

Solve 

Memory 

Products 

MVP  Time 

groundplane 

FFT-LM 

18.6 

48.0 

23.3 

56 

0.85 

FFT-DI 

45.2 

216.1 

118.0 

102 

2.12 

FMM-DI 

50.8 

279.3 

137.8 

103 

2.71 

symmetry  plane 

FFT-LM 

18.7 

28.9 

22.9 

34 

0.85 

FFT 

14.9 

32.9 

37.8 

26 

1.27 

FMM 

28.2 

44.3 

88.5 

28 

1.58 

dielectric  interface 
e+  =  l,e_  =  3 

FFT-LM 

18.9 

42.7 

23.2 

50 

0.85 

FMM-DI 

(63) 

(608) 

299.0 

124 

4.9 

dielectric  interface 
e_|_  =  3,  e_  =  1 

FFT-LM 

18.9 

34.2 

23.0 

40 

0.86 

FMM-DI 

(63) 

(493) 

296.0 

102 

4.83 

Table  6-6:  Two  hemisphere  example.  Table  gives  computational  resources  required  by  the 
layered  media  precorrected-FFT  code  [FFT-LM]  and  the  free-space  precorrected-FFT  [FFT- 
FS]  and  fast-multipole  [FMM]  codes.  The  free  space  codes  use  discretized  interfaces. 
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Figure  6-5:  The  airbridge  example.  Actual  discretization  is  finer  than  shown. 


a  second  dielectric  layer,  dielectric  constant  1,  above,  is  also  considered. 

In  general,  the  results  are  in  accord  with  what  was  discussed  above:  for  the  single-interface 
problems,  the  precorrected-FFT  method  is  significantly  more  efficient  than  the  fast  multipole 
method  applied  to  an  equivalent  charge  (discretized-interface)  formulation. 

However,  when  the  second  trace  is  added  to  the  air  bridge  example,  the  layered-media 
precorrected-FFT  code  becomes  much  less  efficient.  It  requires  four  times  fewer  iterations  to 
converge  than  FASTCAP  (and  note  that  this  advantage  in  realistic  applications  would  likely 
be  ameliorated  by  use  of  a  preconditioner),  but  has  only  a  factor  of  two  advantage  in  matrix- 
vector  product  time.  In  fact,  when  the  second  trace  is  added  to  the  air  bridge  example,  the 
matrix- vector  product  time  increases  by  over  a  factor  of  two,  even  though  the  number  of  panels 
increases  by  only  about  30%.  This  is  due  to  the  inefficient  decomposition  of  space  imposed  by 
the  use  of  the  FFT.  It  is  encouraging,  however,  that  the  setup  time  of  the  precorrected-FFT 
method  did  not  increase  greatly  (only  about  50%),  despite  the  use  of  a  much  more  complicated 
Green  function.  It  is  also  worth  noting  that  the  precorrected-FFT  method  still  used  significantly 
less  memory. 

In  summary,  in  this  chapter  we  demonstrated  that  the  precorrected-FFT  method  is  an 
effective  approach  for  electrostatic  analysis  of  conducting  bodies  in  stratified  media.  Despite 
becoming  much  less  efficient  when  conductors  are  located  in  multiple  dielectric  layers,  the 
precorrected-FFT  approach  is  still  attractive,  as  in  many  cases  it  is  still  more  efficient  than  the 
equivalent-charge  formulations,  particularly  in  terms  of  memory  utilization.  The  qualitative 
advantages  of  not  having  to  generate  interface  discretizations  are  even  more  compelling. 


Example[m,n] 

Code 

Setup 

Solve 

Memory 

Products 

MVP  Time 

comb[2, 13024] 

FFT-LM 

65.5 

78.2 

60.4 

51 

1.53 

FMM 

70.3 

211 

77 

3.36 

two  hemispheres[2,5200] 

FFT-LM 

33.3 

WHOM 

23.7 

75 

0.53 

ei  =  l,eo  =  4,  groundplane 

FMM 

58.9 

531.0 

194 

3.67 

air  bridge[2,1120] 

FFT-LM 

8.5 

5.72 

4.64 

31 

0.18 

ei  =  1,  e0  =  5,  groundplane 

FMM 

12.1 

107.1 

48.7 

136 

0.78 

air  bridge,  second  trace[3,1504] 

FFT-LM 

12.3 

21.6 

10.5 

51 

0.42 

ei  =  1,  e_  =  5,  groundplane 

FMM 

13.8 

168.8 

54.5 

197 

0.86 

Table  6-7:  Realistic  and  multi-interface  examples,  m  is  the  number  of  non-interface  conductors 
in  the  problem,  n  the  number  of  panels,  excluding  the  interface. 


Helmholtz  Problems 


In  the  previous  two  chapters  we  gave  a  detailed  discussion  of  the  performance,  advantages, 
and  limitations  of  the  precorrected-FFT  method  as  applied  to  problems  with  Laplace  kernels. 
In  fact,  most  of  the  statements  made  apply  to  any  kernel  that  has  a  convolutional  structure 
and  is  singular-smooth.  By  singular-smooth  we  mean  a  function  that  is  smooth  on  any  length 
scale  as  long  as  it  is  examined  sufficiently  far  away  from  the  origin.  Examples  of  singular- 
smooth  functions  are  1/r,  e~r2/r,  l/(r2  +  a2).  The  problem  becomes  much  harder  when  treating 
functions  that  have  an  additional  length  scale,  such  as  sin  kr/r.  Arbitrarily  far  away  from  the 
origin,  this  function  oscillates  on  a  scale  of  the  wavelength  A  =  27 x/k.  If  the  approximation  of 
the  function  away  from  the  origin  does  not  resolve  oscillations  on  the  scale  of  A  the  computed 
answers  to  an  integral  equation  involving  this  kernel  will  be  inaccurate.  The  advantage  of  the 
precorrected-FFT  method  is  that  it  is  easier  to  adapt  to  oscillatory  kernels  than,  for  example, 
algorithms  which  rely  on  a  multi-level  spatial  decomposition.  What  is  given  up  is  that  the 
precorrected-FFT  method  can  never  achieve  optimal  computational  complexity. 

In  this  chapter  we  consider  a  simple  problem  to  demonstrate  that  the  precorrected-FFT 
method  can  effectively  compute  the  matrix- vector  products  stemming  from  a  boundary-element 
discretization  of  an  integral  equation  with  a  kernel  of  the  form  elkr jr  and  its  derivatives.  As  a 
model  problem  we  consider  the  scalar  Helmholtz  equation  with  Dirichlet  boundary  conditions. 
This  will  be  the  oscillatory-kernel  analogy  of  the  first-kind  integral  equation,  Eq.  1.2.  We  will 
show  that,  as  long  as  the  required  accuracy  is  not  large,  the  precorrected-FFT  method  can 
treat  the  oscillatory  kernel  case  with  a  surprisingly  small  increase  in  computational  resources 
relative  to  the  static  (1/r)  case. 

7.1  Formulation  and  Discretization 


We  wish  to  solve 


V2ip(x)  +  k2tp(x)  =  0  a;  €  R3\.D 


95 


Figure  7-1:  Condition  number  k  of  matrix  P  vs.  wavenumber  k  for  240-panel  sphere,  derived 
from  integral  equation  based  on  single-layer  (SL,  t]  — ►  oo),  double-layer  (DL,  rj  =  0),  and 
combined-layer(CL,  rj  =  1)  potentials. 


where  the  domain  D  is  a  bounded  region  in  R3  with  boundary  dD  consisting  of  a  finite  num¬ 
ber  of  disjoint  closed  bounded  surfaces,  and  the  exterior  R3\D  is  assumed  to  be  connected. 

This  problem  can  be  cast  into  an  integral  equation  form  using  monopole,  dipole  or  combined- 
layer  potentials  [10].  We  will  consider  combined-layer  potentials  as  the  produce  linear  system 
which  are  generally  better  conditioned.  More  importantly,  integral  equations  derived  from  the 
combined-layer  potentials  do  not  suffer  from  the  irregular-frequency  problem  as  do  equations 
derived  from  the  plain  single-layer  and  double-layer  densities.  At  an  irregular  frequency,  the 
integral  equation  ceases  to  have  a  unique  solution  at  a  set  of  discrete  frequency  values  which  cor¬ 
respond  to  interior  resonant  frequencies.  This  non-uniqueness  of  the  integral  equation  corrupts 
numerical  solution  of  the  equation  in  the  vicinity  of  the  irregular  frequency. 

In  the  combined-layer  case,  the  potential  is  represented  by 

ip(x)  =  f  {Gn{x,x')  -  irjG(x,x')}a(x')da',  x6S  (7.2) 

Js 

where  x,  x'  €  R3,  S  is  a  multiply-connected  two  dimensional  surface  in  R3,  G(x,  x')  =  elk^x~x'^ / A-k\\x- 
x'\\  is  the  Green’s  function  for  the  Helmholtz  equation,  Gn  is  the  surface  normal  derivative  of 
G  at  x\  cr(x')  is  the  combined-layer  density  often  referred  to  as  a  charge  density,  and  r/  is  a 
complex  scalar  which  depends  on  k. 

For  each  point  x  G  dD  for  which  u(x)  is  specified,  the  charge  density  satifies 

— +  [  Gn(x,x')a(x')da'  —  n?  /  G(x,x')a(x')da'  =  u(x)  (7.3) 

2  Js  Js 
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Figure  7-2:  Effect  of  choice  of  integral  equation  on  numerically  computed  scattered  fields  near 
interior  resonances.  Solid  line:  real  part  of  exact  solution.  Dashed  line:  imaginary  part  of  exact 
solution,  x:  computed  real  part  of  solution.  +:  computed  imaginary  part  of  solution.  Top 
plots  show  single-layer  potential,  bottom  plots  show  combined-layer  potentials,  at  ka  —  3.0  and 
ka  =  3.2.  The  single-layer  solution  is  inaccurate  around  the  resonance  ka  =  it,  but  accurate 
elsewhere.  The  combined-layer  solution  is  accurate  near  both  away  from  the  resonance  and 
near  it.  No  matrix  approximations  were  performed  for  these  computations. 
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(7.4) 


and  for  each  point  x  where  un(x)  is  specified  the  charge  density  satisfies 

y  Js  Gn>{x,x')a{x')da'  +  irj- Jg  G(x,x')a{x')da'  =  un(x). 

In  this  chapter  we  will  only  consider  problems  of  the  form  of  Eq.  7.3  (Dirichlet  problems). 

To  discretize  Eq.  7.3  we  use  a  similar  discretization  scheme  as  was  used  for  the  first-kind 
integral  equations  of  Chapters  1-6.  The  charge  density  sigma  is  expanded  in  basis  functions 

n 

^(®)  *  (7-5) 

7=1 

where  the  6t  are  piecewise-constant  on  triangular  or  quadrilateral  panels.  That  is,  the  surface 
is  tiled  with  polygons,  and  the  6i  are  given  by 

6i( x)  =  1  x  e  polygon  i  (7.6) 

Oi(x)  =  0  otherwise. 


A  discrete  set  of  equations  is  obtained  by  enforcing  a  collocation  condition  to  produce  the  linear 
system 

Pq  =  u  (7.7) 

for  the  panels  charges  q,  where  the  matrix  elements  Pij  are  given  by 

Pij  =  ^  +  J  Gn(xi,x')9j(x')da'  -  ir]  J^G(xi,x')9j(x')da',  (7.8) 

Ui  —  u(xi),  and  xt  refers  to  the  centroid  of  polygon  i. 

We  emphasize  that  the  irregular-frequency  problem  is  due  to  the  choice  of  integral-equation 
and  is  not  related  to  any  matrix  approximations,  such  as  made  by  the  precorrected-FFT  method. 
At  any  frequency,  the  solution  to  the  exterior  Dirichlet  problem  does  have  a  unique  solution[10j. 
At  an  irregular  frequency,  the  integral  equation  itself  possesses  a  non-unique  solution,  as  any 
multiple  of  an  interior  resonant  solution  lies  in  the  nullspace  of  the  integral  operator.  This 
interior  resonant  solution,  however,  does  not  generate  scattered  (exterior)  fields.  When  an 
approximate  discretization  is  introduced,  however,  the  various  integral  operators  may  be  poorly 
approximated  such  that  numerical  solutions  near  the  resonant  frequency  become  inaccurate. 
Additional  approximations  such  as  made  by  the  precorrected-FFT  method  will  exacerbate  any 
such  numerical  error.  Thus  when  using  matrix-sparsification  techniques  we  must  be  particularly 
careful  to  only  apply  the  methods  to  well-posed  integral  equations. 

To  see  the  importance  of  eliminating  the  interior  resonance  effects  present  in  integral  equa¬ 
tions  derived  from  pure  single-layer  and  double-layer  potentials,  consider  Figures  7-1  and  7-2. 
A  sphere  of  radius  a  was  coarsely  discretized  (240  triangular  panels)  in  order  to  solve  the 
boundary-value  problem  ip  (a)  =  1.  This  has  solution  ip  =  aeik(-T~a^ /r.  The  singularity  of  the 


FIGURE  7-3:  The  3840  panel  discretization  of  the  sphere. 


integral  operator  is  often  manifested  in  ill-conditioning  of  the  potential  coefficient  matrix  P. 
Fig.  7-1  shows  the  condition  number  k  of  this  matrix  for  potential  coefficients  derived  from 
the  single-layer,  double-layer,  and  combined-layer  potentials.  We  see  that  there  are  apparently 
several  internal  resonances  in  the  interval  k  €  [0,4].  In  particular,  the  single-layer  equation  has 
resonances  where  jn(ka)  =  0  for  n  =  0,1,...,  the  first  of  which  occurs  at  ka  =  i r.  Therefore 
we  expect  the  numerically  computed  solution  to  be  inaccurate  near  ka  =  7r.  This  is  indeed 
the  case,  as  is  demonstrated  in  Fig.  7-2  in  the  upper-right  plot.  When  ka  is  away  from  7 r,  the 
single-layer  equation  gives  accurate  answers  (at  least,  as  accurate  as  can  be  expected  given  the 
coarse  discretization).  At  ka  =  3.2,  however,  the  computed  fields  are  grossly  inaccurate.  On 
the  other  hand,  the  combined-layer  potential  solution  is  accurate  at  all  values  of  ka. 


7.2  Computational  Examples 


Now  we  demonstrate  that  the  precorrected-FFT  technique  can  accurately  compute  solutions 
of  integral  equations  with  an  oscillatory  kernel.  Again  assume  a  sphere  of  radius  a,  but  now 
with  a  more  complicated  boundary  condition.  In  particular,  let  the  boundary  value  u  be  given 
by  a  plane  wave: 


u  —  —e 


ikz 


(7.9) 


Discretizations  of  the  sphere  were  generated  by  recursively  subdividing  the  spherical  triangles 
produced  by  the  vertices  of  8,20  and  60-face  polyhedra.  Fig.  7-3  shows  a  3840  panel  discretiza¬ 
tion.  Figure  7-4  shows  the  relative  average  errors  for  four  different  wavenumbers,  as  a  function 
of  1/n,  where  n  is  the  number  of  panels  in  the  discretization.  As  in  Figure  5-1,  the  errors 
decrease  as  the  discretization  is  refined,  reaching  an  asymptotic  value  at  the  errors  induced  by 
the  grid  approximation.  For  any  discretization,  the  higher  the  frequency,  the  less  accurate  the 
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Figure  7-5:  Scattered  fields  for  plane- wave  boundary  condition  on  sphere,  ka  =  25.  Solid  line: 
real  part  of  exact  solution.  Dashed  line:  imaginary  part  of  exact  solution,  x:  computed  real 
part  of  solution.  +:  computed  imaginary  part  of  solution. 

sphere  discretization  is,  and  so  the  errors  are  higher.  Note  that  the  asymptotic  errors  for  a 
given  grid  order  will  be  roughly  the  same  as  for  the  Laplace  case  (Figure  5-1),  since  as  n  — ►  oo, 
kA  where  A  is  the  cell  size  will  go  to  zero. 

To  demonstrate  the  nature  of  the  field  solutions  at  higher  frequencies,  we  take  ka  =  25, 
corresponding  to  a  sphere  about  8  wavelengths  in  diameter.  The  sphere  was  discretized  into 
61440  panels.  Figure  7-5  shows  the  computed  fields.  The  matrix  solution  phase  of  the  compu¬ 
tation  required  about  10  minutes  on  a  DEC  600/333  and  used  740MB  of  memory.  About  13 
CPU  seconds  are  needed  for  a  matrix- vector  product.  The  average  relative  error  in  the  com¬ 
puted  combined- layer  density  was  about  2%,  and  the  maximum  error  in  the  computed  fields 
was  likewise  a  few  percent,  p  =  4  grid  approximations,  with  kA  =  2.4,  where  A  is  the  cell  size, 
were  needed  for  accurate  answers.  Even  though  in  realistic  applications  such  a  high  frequency 
problem  would  be  better  treated  by  higher-order  discretizations  than  piecewise-constant  collo¬ 
cation,  this  example  illustrates  that  it  is  possible  to  solve  large-scale  problems  with  relatively 
small  computation  times.  The  memory  requirements,  however,  are  still  fairly  large. 

For  purposes  of  comparison,  solving  the  Laplace  equation,  using  a  single-layer  potential,  on 
the  same  geometry  using  precorrected-FFT  accelerated  GMRES,  with  p  =  3,  requires  about 
32  seconds,  with  convergence  occuring  in  seven  iterations  and  requires  215  MB  of  memory. 
The  matrix  vector  product  time  is  about  4.6  seconds.  Note  that  the  penalty  in  matrix-vector 
product  time  for  switching  from  the  real- valued  single-layer  1  jr  potential  to  the  complex  valued 
combined-layer  potential,  with  higher  order,  is  only  a  factor  of  2.8!  This  is  less  than  the  factor 
of  four  penalty  induced  by  switching  from  direct  multiplication  of  a  real  matrix  times  a  real 
vector  to  a  complex  matrix  times  a  complex  vector.  This  is  possible  because  the  FFT  of  a 
complex  sequence  requires  somewhat  less  than  twice  as  much  time  to  compute  as  the  FFT  of  a 
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Figure  7-6:  Computational  resources  needed  to  solve  the  combined-layer  potential  integral 
equation  on  sphere  of  radius  a  =  1.  (x)  Fixed-A;  experiment.  (*)  Variable- A:  experiment.  (-) 
Linefit  to  variable-A:  points.  (-.)  Linefit  to  k  =  0.5  points. 
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real  sequence.  Thus  we  expect  the  relative  performance  benefit  of  using  the  precorrected-FFT 
method  over  standard  direct  techniques  to  increase  for  the  Helmholtz-potential  case  compared 
to  the  Laplace-case.  A  Gaussian  elimination  solution  of  this  problem  would  have  required  about 
a  month  of  CPU  time  on  a  200MFLOPS  workstation,  and  about  56GB  of  storage. 

Next  we  analyze  the  behavior  of  the  precorrected-FFT  method  as  a  function  of  problem 
size,  for  Laplace  and  Helmholtz  kernels.  Figure  7-6  presents  the  primary  experimental  results 
of  this  chapter.  Two  experiments  were  conducted  and  the  CPU  time  required  for  a  matrix 
vector  product,  in  seconds  on  a  DEC  AXP  3000/900,  and  memory  in  MB  necessary  for  the 
computations,  are  shown  in  the  figure,  as  a  function  of  the  number  of  panels  N  in  the  discretized 
system.  In  both  experiments  we  seek  to  obtain  2%  relative  mean-square  error  in  the  combined- 
layer  charge  density  (which  can  be  analytically  computed).  The  2%  error  level  was  selected 
because  it  allows  the  largest  dynamic  range  for  the  limited  set  of  discretized  spheres  available; 
in  any  event  this  choice  of  parameter  is  in  principle  arbitrary. 

The  first  experiment,  the  results  of  which  are  shown  in  the  lower  curve  marked  ka  —  0.5, 
involved  solving  the  combined-layer  integral  equation,  for  the  plane-wave  boundary  condition, 
at  fixed  ka  =  0.5  for  each  of  the  available  n-panel  sphere  discretizations.  As  ka  =  0.5  was  small 
for  each  of  the  n,  2%  accuracy  was  generally  obtained  with  grid  order  p  =  2.  From  the  dash-dot 
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FIGURE  7-7:  (o)  Resources  required  for  FMM,  l  =  2,  to  solve  Eq.  1.2.  (*)  variable-A;  experiment 
from  Fig.  7-6  Left  figure  shows  CPU  time  needed  for  matrix-vector  product,  right  figure  shows 
needed  memory,  in  MB. 

line  which  is  a  fit  to  the  experimental  results,  we  see  that,  as  predicted  in  Chapter  4,  the  CPU 
time  and  memory  needed  for  the  matrix-vector  product  appears  to  grow  about  as  n1*2  for  fixed 
ka. 

The  second  experiment,  shown  in  the  upper  plots,  is  more  difficult  to  explain.  For  each 
available  n-panel  discretization  of  the  sphere,  the  wavenumber  k  was  increased  until  the  maxi¬ 
mum  k  for  which  2%  relative  error  could  be  achieved  was  reached.  Then,  the  precorrected-FFT 
parameters  (p  and  grid  spacing)  were  optimized  to  achieve  minimal  time-memory  product  sub¬ 
ject  to  the  2%  error  requirement.  As  for  each  discretization  this  results  in  ka  >  0.5,  higher 
order  approximations  and  thus  more  computational  resources  were  required  compared  for  this 
variable-A;  experiment  relative  to  the  first  fixed-A;  experiment.  From  Chapter  4  we  expect  the 
time  required  for  a  matrix- vector  product  to  grow  as  n4/3  for  moderate  frequencies,  and  this  is 
indeed  what  is  observed  experimentally.  At  the  largest  k  considered  on  these  plots,  the  sphere 
was  more  than  four  wavelengths  in  diameter. 

It  is  difficult  to  make  a  comparison  of  the  precorrected-FFT  method,  applied  to  the  Helmholtz 
kernel,  to  competing  algorithms,  as  there  is  a  dearth  of  publicly  available  codes.  Instead,  in 
Figure  7-7  we  have  made  a  comparison  to  the  fast  multipole  algorithm  for  the  Laplace  kernel. 
The  rationale  is  that  any  fast-multipole  type  algorithm  for  Helmholtz  kernels  is  likely  to  require 


more  significant  computational  resources  than  a  similar  fast-multipole  algorithm  for  the  Laplace 
kernel,  so  if  the  precorrected-FFT  method,  applied  to  the  Helmholtz  kernel  compares  well  with 
the  Laplace-FMM,  surely  the  Helmholtz-precorrected-FFT  method  will  compare  well  with  a 
Helmholtz-FMM.  In  Figure  7-7  we  compare  the  variable- results  from  the  previous  figure  to 
the  FMM,  using  second-order  multipole  expansions.  In  general,  we  see  that  the  precorrected- 
FFT  method  can  compute  a  matrix-vector  product  for  a  problem  with  an  oscillatory  kernel  in 
about  the  same  time  needed  by  the  fast-multipole  method  for  a  problem  with  a  1/r  kernel. 

Furthermore,  despite  the  increased  memory  requirements  of  the  precorrected-FFT  method 
in  the  Helmholtz  case,  for  all  the  examples  considered  the  precorrected-FFT  method  required 
less  memory  than  the  Laplace-fast-multipole  method  executed  on  the  same  geometry. 

To  gain  some  further  understanding  of  the  efficiency  of  the  pre-corrected  FFT  method  at 
non-zero  frequencies,  consider  Table  7-1,  which  shows  statistics  for  solution  of  the  Helmholtz 
equation  on  the  61440-panel  sphere.  At  low  frequencies,  up  to  about  five  wavelengths  per  sphere 
diameter,  the  memory  usage  and  CPU  time  required  for  a  matrix- vector  product  increases  only 
slightly  over  the  static  code  (primarily,  the  increase  reflects  the  differences  between  real  and 
complex  arithmetic,  with  associated  storage  implications).  As  the  frequency  increases  to  the 
point  where  the  problem  domain  encompasses  several  wavelengths,  however,  the  code  starts  to 
slow  significantly.  Primarily  this  is  due  to  a  jump  in  the  required  grid  order,  from  p  =  3  to 
p  =  4  around  the  six  wavelengths/diameter  point.  Clearly  what  is  needed  is  a  way  to  smoothly 
increase  accuracy  of  the  approximation  without  the  large  jumps  in  computation  time  required 
by  changing  grid  order.  One  possibility  is  to  include  more  panels  in  the  direct  interaction  list. 
Another  is  to  modify  the  FFT  code,  as  at  present  it  only  allows  array  sizes  which  are  powers  of 
two.  It  is  also  apparent  that,  even  when  using  the  well-conditioned  combined-layer  formulation, 
some  sort  of  preconditioner  for  the  iterative  scheme  would  be  useful  higher  frequencies,  as  the 
tendency  for  the  number  of  iterations  to  increase  with  wavenumber  indicates  the  linear  system 
is  becoming  ill-conditioned. 

Finally,  we  compare  the  accuracy  of  the  grid-collocation  operators  developed  in  Chapter  2 
to  operators  derived  from  polynomial  interpolation.  In  Chapter  5,  we  showed  that  for  k  =  0, 
for  equivalent  costs  in  time  and  memory,  the  grid-collocation  operators  produced  more  accurate 
results.  Generally  in  numerical  computations  it  is  possible  to  trade  accuracy  for  computational 
cost,  so  in  this  section  it  is  instructive  to  consider  the  converse  comparison,  i.e.,  to  compare 
the  computational  costs  of  the  two  schemes  for  a  fixed  error  metric.  Figure  7-8  demonstrates 
that  for  the  fixed  error  ceiling  of  1%  relative  in  the  charge  density,  the  grid-collocation  method 
is  somewhat  more  efficient.  Chapter  4  predicts  that  the  algorithm  when  using  polynomial 
interpolation  should  show  a  larger  rate  of  growth  than  when  using  grid-collocation.  While 
the  linefit  to  the  polynomial  data  does  have  a  larger  slope,  the  available  data  does  not  seem 
sufficient  to  make  any  firm  claims  about  complexity. 


df  A 

Iterations 

Approximation 

0  (FASTCAP) 

1.14 

10 

l  =  2 

0  (FFT) 

1.0 

1.0 

7 

P  =  3 

wmmm 

1.96 

1.80 

6 

P  =  3 

WEmmmm 

1.96 

1.80 

6 

P  =  3 

2.00 

1.85 

17 

P  =  3 

2.06 

32 

P  =  3 

7.0 

2.83 

| 

44 

p  =  4 

Table  7-1:  Resources  required  to  solve  scalar  Helmholtz  equation  on  a  sphere  of  diameter  d 
discretized  into  61440  panels.  Memory  and  times  are  normalized  to  those  required  for  the 
static  precorrected-FFT  code.  Parameters  chosen  for  1%  relative  error  in  charge  density.  More 
iterations  are  needed  for  ka  =  0  than  ka  =  0.5  because  the  ka  =  0  codes  use  the  less  well- 
conditioned  single-layer  formulation. 


Figure  7-8:  Product  of  cpu  time- memory  for  matrix- vector  products  for  1%  error  solution 
of  Helmholtz  equation  on  sphere,  (o)  polynomial  interpolation  operators,  (x)  grid-collocation 
operators. 
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Accelerated  full-wave  EM  analysis 


Many  aspects  of  engineering  design,  such  as  interconnect  delay  estimation,  signal  integrity 
analysis,  and  electromagnetic  compatibility  applications,  require  understanding  the  electrical 
properties  of  complex  structures.  However,  because  of  the  great  computational  cost,  full-wave 
simulation  of  large,  complex  three-dimensional  structures  is  rarely  used  in  the  engineering 
design  and  optimization  process.  At  best,  a  range  of  representative  structures  is  simulated, 
and/or  a  variety  of  engineering  approximations  are  made  to  derive  either  analytical  models,  less 
computationally  demanding  numerical  models,  or  sets  of  rule-of-thumb  design  guidelines.  While 
these  approaches  are  simple  and  intuitive,  as  interconnect  technologies  become  more  complex 
and  signal  frequencies  rise,  the  task  of  obtaining  simple  yet  accurate  models  of  geometrically 
complex  structures  will  become  more  difficult,  and  time  spent  deriving  and  applying  simple 
models  may  become  burdensome.  Thus,  efficient,  fully  three-dimensional,  full-wave  simulation 
tools  will  be  increasingly  needed  for  engineering  design  of  microelectronic  systems. 

Finite-difference  methods,  particularly  the  finite-difference  time-domain  (FDTD)  technique, 
and  finite-element  methods  are  popular  approaches  to  full- wave  electromagnetic  analysis.  These 
difference-based  schemes  generate  sparse,  but  very  large,  models.  In  contrast,  methods  based  on 
integral  equation  formulations,  such  as  method-of-moments  techniques,  generate  smaller,  but 
dense,  matrix  representations.  While  the  difference-based  methods  can  operate  in  the  time  or 
frequency  domain,  integral-equation  based  codes  are  usually  frequency-domain.  Development 
of  integral-equation  based  tools  has  been  hampered  by  the  high  computational  complexity  of 
dense  matrix  representations  and  difficulty  in  obtaining  and  utilizing  the  frequency-domain 
response. 

In  section  8.1  we  review  the  retarded  partial-element  equivalent  circuit  (rPEEC)  electromag¬ 
netic  formulation[63]  used  for  full-wave  electromagnetic  analysis.  We  emphasize,  however,  that 
our  numerical  methods  are  not  tied  to  this  formulation.  Section  8.2  reviews  the  precorrected- 
FFT  algorithm  used  to  construct  a  sparse  approximate  representation  of  the  rPEEC  model  and 
introduces  notation  which  will  be  useful  in  Chapter  9. 


8.1  The  rPEEC  formulation 


We  begin  by  expressing  the  electric  field  in  terms  of  the  vector  potential  A  and  the  scalar 
potential  0 

E  =  -^-V<j>  (8.1) 

which  after  a  Laplace  transformation  becomes 


E  =  —sA  —  V<f> 


(8.2) 


In  the  Lorentz  gauge,  the  potentials  in  Laplace  domain  are  related  to  the  currents  J  and 
charges  q  by: 

„  r  0-s\x-x'\/c 

(8.3) 


„  lL  r  ,  _  e-s\x-x'\/c 

=  4 nld  X'J{-X''S>>  \x-x>\ 

If  e-s\x-x'\/c 


I  .1  M 

\x  —  xf\ 

The  exponential  factor  in  the  integrals  of  Eq.  8.3,  8.4  represents  a  delay  (retardation)  in  the 
time-domain  which  is  due  to  the  finite  propagation  speed  of  light. 

— *  — * 

Assuming  a  set  of  conductors  in  a  uniform  medium,  in  every  conductor,  Ohm’s  law  E  —  J/a, 
with  a  the  conductivity,  allows  us  to  write 


J(x,  t) 
a{x) 


+  sA(x,t )  +  V(j)  =  0 


(8.5) 


often  known  as  the  electric  field  integral  equation  (EFIE).  Adding  the  continuity  equation 

V  •  J(x,t)  +  sq(x,t)  =  0  (8.6) 

to  Eq.  8.5  produces  a  system  of  equations  which,  given  appropriate  boundary  conditions,  can 
be  solved  for  the  charges  q  on  conductor  surfaces  and  the  currents  J  flowing  in  the  conductors. 

To  achieve  a  method-of-moments[l]  discretization  of  the  problem,  rectangular  elements  of 
constant  current  are  used  to  represent  J,  and  the  charges  are  assumed  to  be  piecewise-constant 
over  rectangles  which  tile  the  conductor  surfaces.  After  discretization,  the  charges  are  elimi¬ 
nated  as  variables,  leading  to  the  single  equation  for  the  vector  of  discretized  currents  j(s ) 

TtP(s)T  +  sR  +  s2L(s)j  j(s)  =  su  (8.7) 

with  T  the  incidence  matrix  for  the  equivalent  circuit.  In  Eq.  8.7,  P{s)  is  a  dense  matrix 
coupling  all  the  capacitive  cells  (it  represents  the  scalar  potential,  or  “capacitive”  interactions). 
Its  elements  are  .  ... 

1  r  r  p-sx-x'/c 

Pijis)  =  J-  f  cfix'd'x6- - -  (8.8) 

47reo  Jx'eSi  JxeSj  \x  —  x\ 
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where  Si ,  Sj  represent  the  i,j th  rectangular  charge  basis  function  respectively.  R  is  a  diagonal 
matrix  representing  restistances.  L(s)  represents  the  inductive  interactions,  and  its  matrix 
elements  are  similar  to  those  of  P(s). 

Eq.  8.7  can  be  written  in  terms  of  a  single  frequency  dependent  matrix  ^4(s)  (not  to  be 
confused  with  A) 


A(s)j(s)  =  su  (8.9) 

Given  a  vector  u  of  voltage  source  excitations  of  the  equivalent  circuit,  Eq.  8.9  can  be  solved 
for  the  discrete  currents  j(sn)  at  a  particular  complex  frequency  sn. 

8.2  Grid-based  matrix  sparsification 

The  difficulty  with  the  rPEEC  formulation,  as  in  all  integral-equation  based  electromagnetic 
solvers,  is  that  it  contains  frequency-dependent  dense  matrices.  That  is,  the  matrices  P(s)  and 
L(s)  have  0(n2)  non-zero  entries,  where  n  is  the  number  of  degrees  of  freedom  in  the  discretized 
system,  all  of  which  must  be  accounted  for  to  obtain  an  accurate  solution.  Presently,  most 
full-wave  method-of-moments  solvers  at  some  point  use  Gaussian  elimination  to  perform  an 
LU-decomposition  of  the  matrix  A(s).  In  that  case,  simply  storing  the  matrix  requires  0(n2) 
memory,  and  0(n 3)  floating-point  operations.  For  a  complex  matrix  of  size  n  =  10, 000,  nearly 
1.5  gigabytes  of  memory  would  be  needed  to  store  the  matrix  in  double-precision,  and  a  machine 
capable  of  100  MFLOPS  would  take  almost  eight  hours  to  perform  a  single  LU-decomposition. 

A  more  effective  approach  is  to  construct  a  sparse  representation  A(s)  of  the  system  de¬ 
scription  A(s),  such  that  multiplication  of  a  vector  y  by  A(s)  can  be  performed  in  close  to  0(n) 
operations  and  storage,  and  such  that  the  difference  ||A(s)y  —  A(s)2/||,  where  ||  ||  denotes 

vector  norm,  is  small  for  an  arbitrary  vector  y.  To  obtain  the  system  response  j{s),  a  Krylov- 
subspace  iterative  matrix  solver  such  as  GMRES[11]  is  used.  Given  a  complex  frequency  so, 
such  algorithms  can  compute  the  solution  j  (so)  to  the  linear  system  of  equations  A(so)j(so)  =  b, 
for  given  b  by  performing  only  matrix- vector  product  operations  with  the  matrix  A(so).  If  the 
number  of  iterations  needed  for  convergence  is  bounded,  the  resulting  algorithm  will  need  close 
to  0(n )  time  and  storage  to  compute  a  solution  at  a  single  frequency  point. 

For  the  quasi-static  (non-retarded)  case,  a  variety  of  effective  algorithms  is  available,  the 
most  well-known  of  which  is  the  Fast  Multipole  Method  [64]  used  in  FASTCAP  and  FAS- 
THENRY.  The  oscillatory  Green’s  function  (Eqs.  8.3,  8.4)  used  in  full-wave  electromagnetic 
analysis  is  more  difficult  to  treat,  and  although  fast-multipole  type  and  multigrid  algorithms 
have  been  proposed,  they  are  not  as  mature  as  algorithms  for  the  quasistatic  case. 

In  the  preceding  chapters,  a  multigrid-like[6]  “precorrected-FFT”  algorithm  was  presented 
which  for  not-too- inhomogeneous  geometries  significantly  reduces  the  0(n2)  time  and  memory 
needed  to  compute  a  matrix-vector  product.  As  we  shall  see  in  Chapter  9,  this  algorithm  is 


of  particular  interest  in  the  context  of  model  reduction,  as  a  single  algorithm  can  be  used  to 
span  the  entire  range  of  frequencies,  including  zero  frequency,  which  is  not  the  case  for  the  fast 
multipole  algorithms. 


Algorithm  1  (Precorrected-FFT). 

1.  Project  “charge”  q  onto  grid  :  qg~Wq 

2.  Compute  grid-charge  potentials  (j)g  (FFT)  :  <f)g  =  U(s)qg 

3.  Interpolate  grid  potentials  :  yg  —  WT(j)g 

4 .  Add  local  interactions  :  y  =  yg  +  D(s)q 


To  derive  the  algorithm,  first  consider  the  evaluation  of  the  sum 

y(xj)  =  Y^g(\xi-Xj\,s)q(xi)  i,j  =  l...N  (8.10) 

i 

for  some  set  of  N  discrete  “charges”  q  at  points  Xi,  and  evaluating  the  “potential”  y  (as  given 
by  the  Green  function  g(\xi  -  xj\)  )  at  all  the  other  charge  positions  xr  This  sum  may  be 
approximated  in  a  four  step  process  (Algorithm  1).  A  uniform  grid  is  introduced  which  covers 
the  problem  domain  (note  that  the  grid  is  not  in  any  way  linked  to  the  underlying  problem 
discretization).  The  first  step  is  to  represent  the  charge  q  on  the  grid.  By  this  we  mean  that 
for  each  charge,  a  small  set  of  point  charges  that  lie  on  the  grid  and  surround  the  charge  being 
“projected”  is  used  to  approximate  the  long-range  potential  of  the  charge.  Second,  the  potential 
of  all  the  grid  charges  is  computed  at  the  grid  points.  This  operation  can  be  accomplished  in 
several  ways,  the  simplest  of  which  is  by  use  of  the  FFT.1  Third,  the  potential  on  the  grid 
is  interpolated  onto  the  evaluation  points.  Finally,  since  the  grid  representation  will  only  be 
accurate  far  away  from  the  charge  being  approximated  [6,  38],  the  potential  of  nearby  charges 
must  be  computed  exactly. 

If  we  introduce  the  auxiliary  variables  qg  for  the  grid  charges  and  (fig  for  the  grid  potentials, 
we  can  write  the  precorrected-FFT  algorithm  as 

qg  =  W(s)q  (8.11) 

c/>g  =  U(s)qg  (8.12) 


‘For  highly  inhomogeneous  problems,  a  multigrid  algorithm  with  more  than  two  grid  levels  could  be  used. 


(8.13) 


y  =  D(s)q  +  V{s)<j)g 

where  W(s)  is  a  (possibly  frequency-dependent)  projection  operator,  U(s)  is  a  Toeplitz 
matrix  which  corresponds  to  convolution  of  the  Green  function  with  the  grid  charges  to  give 
grid  potentials,  F(s)  is  an  interpolation  operator,  and  D(s)  is  a  sparse  matrix  which  represents 
interactions  between  nearby  charges.  Thus  the  precorrected  FFT  algorithm  replaces  the  relation 

y  =  G(s)q  (8.14) 

with  dense  matrix  G(s)  by  an  approximate  representation 

y  =  [D(8)  +  V(s)U{s)W(s)]q  (8.15) 

U{s)  is  dense,  but  due  to  the  use  of  the  FFT  it  possesses  a  sparse  factorization,  so  the 
representation  of  Eq.  8.15  is  effectively  sparse. 

The  problem  of  perfoming  a  multiplication  by  the  matrix  P(so),  for  some  so>  is  very  similar. 
In  this  case  the  Green  function  is  given  by 

e-So\Xi-Xj\/c 

g(\xj  -  x,-|,*0)  =  |  _  |  (8-1.6) 

\Xi  Xj  | 

where  c  is  the  speed  of  light.  The  potential,  instead  of  being  due  to  point  charges,  is  now  given  by 
a  uniform  charge  density  over  the  surface  of  a  rectangle,  and  the  potential  is  not  just  evaluated  at 
one  point,  but  integrated  over  the  rectangle  surface.  For  sufficiently  separated  panels,  however, 
the  integral  is  well  approximated  by  the  potential  of  a  point  charge  at  a  rectangle  centroid 
evaluated  at  the  other  rectangle’s  centroid.  Thus,  the  interpolation  and  projection  operators 
V,  W  can  be  taken  to  correspond  to  point  charges,  and  so  the  precorrected-FFT  algorithm  we 
have  used  for  rectangular  elements  differs  from  the  point-charge  case  only  in  the  way  the  local 
interactions  are  treated. 

As  will  be  clear  later,  for  the  purposes  of  model  reduction  it  is  convenient  to  choose  the 
interpolation  and  projection  operators  to  be  frequency-independent.  The  simplest  such  choice 
corresponds  to  polynomial  interpolation.  As  is  discussed  in  Chapter  2  and  [6,  38],  the  problems 
of  projection  (step  2)  and  interpolation  (step  3)  are  completely  equivalent.  Thus,  since  a  sym¬ 
metric  discretization  technique  was  used  in  the  rPEEC  formulation,  the  interpolation  operator 
V  is  simply  the  transpose  of  the  interpolation  operator  W,  and  the  sparse  factorization  of  the 
/l(s)  matrix  is 

A(s)  =  D(s)  +  WTU(s)W  (8.17) 

It  can  be  concluded  from  the  results  of  Chapters  2  and  7  that  Algorithm  1,  resulting  in  the 
factorization  of  Eq.  8.17,  calculates  a  product  operation,  including  the  effects  of  all  long-range 
interactions,  to  engineering  accuracy  when  even  fairly  low-order  interpolation  operators  are 
used[6,  38]. 


Ill 


Model  Reduction  Issues 


Problems  with  oscillatory  kernels,  such  as  the  Helmholtz  problems  discussed  in  Chapter  7 
and  the  full- wave  electromagnetic  models  presented  in  Chapter  8,  introduce  two  difficulties  when 
compared  to  the  simple,  static  Laplace  equation.  First,  the  oscillatory  kernel  is  more  difficult 
to  approximate.  We  have  already  shown  that  the  precorrected-FFT  method  can  overcome  this 
difficulty.  Second,  often  what  is  needed  is  to  determine  a  system  response  as  a  function  of  a 
parameter  such  as  frequency.  Thus  in  principle  many  parameter-dependent  integral  equations 
must  be  solved,  as  opposed  to  the  case  of  electrostatic  analysis,  where  for  a  given  geometry  only 
a  single  integral  equation  need  be  solved  (although  possibly  for  several  different  right-hand-side 
excitations). 

The  simplest  approach  to  obtaining  the  frequency-dependent  response  is  to  solve  the  relevant 
integral  equation  at  a  set  of  discrete  frequencies.  At  each  frequency  the  solution  can  be  obtained 
via  the  precorrected-FFT  accelerated  iterative  scheme  discussed  in  the  previous  chapters.  There 
are  two  primary  difficulties  with  this  approach.  First,  it  is  inefficient  as  it  does  not  exploit  the 
fact  that  the  solutions  are  usually  smoothly  varying  as  a  function  of  frequency.  It  should  be 
possible  to  simultaneously  obtain  the  integral  equation  solutions  for  two  frequency  points  that 
are  close  together  with  less  effort  than  if  the  solution  at  each  of  the  two  points  were  performed 
completely  independently.  Secondly,  the  frequency  response  may  contain  contributions  from 
poles  near  the  imaginary  axis.  Such  poles  generate  sharp  spectral  features  that  are  difficult  to 
resolve  using  discrete  sampling  on  the  imaginary  axis,  but  they  can  be  resolved  in  an  efficient 
manner  using  rational  function  approximation. 

In  this  chapter,  we  introduce  a  hybrid  algorithm  which  incorporates  features  of  orthogonal- 
ized  Krylov  methods[65]  and  the  series- expansion  based  methods[66]  to  construct  a  multipoint 
rational  approximant,  for  a  system  with  distributed  elements,  in  a  manner  similar  to  the  “com¬ 
plex  frequency  hopping”  (CFH)  multipoint  algorithm[31].  The  resulting  series-Krylov  (SKCFH) 
algorithm  in  combination  with  a  multilevel  transform  representation  yields  an  efficient  and  el¬ 
egant  solution  of  the  electromagnetic  problem.  We  demonstrate  that  an  algorithm  based  on 


application  of  a  novel  model-order  reduction  scheme  directly  to  the  sparse  model  generated  by 
a  fast  integral  transform  has  significant  advantages  for  frequency-  and  time-domain  simulation. 

This  method  can  obtain  a  high-order  rational  approximation  (or  reduced-order  model)  of  the 
sparse  representation  (Eq.  8.17)  in  a  numerically  stable  manner.1  The  high  order  is  necessary  in 
order  to  represent  the  fine  spectral  features  and  retardation  in  the  frequency  response  of  typical 
electromagnetic  models.  Explicit  moment-matching  techniques  are  not  numerically  stable  and 
will  not  be  sufficient  for  this  problem.  Orthogonalized  Krylov  subspace  methods,  such  as  the 
nonsymmetric  Lanczos  algorithm  or  the  Arnoldi  method  [29,  28,  65],  while  useful  for  stably 
constructing  arbitrarily  high  order  approximants  of  lumped  systems,  are  not  directly  applicable 
to  distributed  (delay)  systems  such  as  transmission  lines  or  rPEEC.  We  extend  the  techniques 
aforementioned  to  created  a  series-Krylov  CFH  (SKCFH)  algorithm  which  is  efficient  (requires 
few  expansion  points),  numerically  stable,  and  applicable  to  model  reduction  of  distributed  and 
delay  systems. 

9.1  Rational  approximation  via  orthogonalized  Krylov  meth¬ 
ods 

Before  considering  the  generate  rational  approximation  problem  for  the  fullwave  rPEEC 
model  presented  in  Chapter  8,  it  is  instructive  to  consider  approximate  models  of  lumped  linear 
dynamical  systems,  as  model  reduction  procedures  for  these  systems  are  well-developed. 

Consider  the  Laplace  domain  system  description  of  a  linear  lumped  network, 

sAx  =  x  +  b  (9.1) 

y  =  cTx  (9-2) 

where  A  is  the  matrix  desciption  of  the  system  that  relates  the  entries  of  the  input  vector  b  to 
the  set  of  unknown  system  variables  in  the  vector  x,  and  y  is  the  vector  of  outputs  obtained 
from  the  internal  system  variables  via  a  mapping  vector  cT .  It  is  clear  that  the  frequency 
dependent  response  y(s)  of  such  a  system  will  be, 

y(s)  =  -cT(I  -  sA)~1b.  (9.3) 

The  response  y(s)  is  a  rational  function  of  the  Laplace  frequency  s,  so  it  is  reasonable  to 

consider  approximating  y(s)  via  a  lower  order  rational  function.  Moment  matching  methods 
use  the  sequence  of  moments  cTb ,  cTAb,  cT  A2b . . .  cTAkb  to  obtain  a  Pade  approximation  to  y(s). 
In  finite  arithmetic,  at  some  k  (which  could  be  small!)  the  vectors  Akb  and  Ak~1b  will  no  longer 
be  linearly  independent  and  the  process  breaks  down.  No  accurate  higher  order  approximation 
can  then  be  obtained. 


1Note  that  the  use  of  the  word  “stable”  in  this  section  refers  to  numerical  stability  of  the  model  reduction 
algorithm,  not  time-stability  of  the  reduced-order  model,  which  is  an  important  but  separate  issue. 
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In  order  to  obtain  a  more  stable  algorithm,  we  consider  algorithms  that  operate  explicitly 
in  the  Krylov  subspace 

K{A,b}  =  {b,Ab,A2b,---}  (9.4) 

of  the  matrix- vector  pair  {A,  b}.  In  orthogonalized  Krylov-subspace  approaches  to  model  reduc¬ 
tion,  a  reduced  order  matrix  model  is  constructed  based  on  a  set  of  vectors  that  span  the  Krylov 
subspace  K{A,  b}  (and  possibly  /C{AT,c}).  By  retaining  an  orthogonality  relation  among  the 
vectors,  linear  independence  can  be  maintained,  and  so  high  order  rational  approximants  can 
constructed. 

For  the  model  reduction  problem,  at  least  two  alternative  Krylov  subspace  algorithms  exist: 
the  nonsymmetric  Lanczos  algorithm  and  the  Arnoldi  method.  We  have  chosen  the  Arnoldi 
method  for  several  reasons.  First,  a  rational  approximant  for  all  the  variables  in  the  system  is 
directly  and  naturally  generated,  which  is  useful  in,  for  example,  electromagnetic  compatibility 
analysis.  Second,  for  our  dense-matrix  problems  and  low-order  (compared  to  the  system  size) 
approximations,  we  do  not  expect  the  full  orthogonalization  needed  in  the  Arnoldi  method  to 
be  as  costly  as  in  sparse  matrix  problems  where  the  nonsymmetric  Lanczos  method  might  be 
more  efficient.  Finally,  our  approach  to  distributed  systems  is  easier  to  describe  and  implement 
using  the  Arnoldi  approach. 


Algorithm  2  (Arnoldi). 

n  =  b/\\b\\ 
for  k  =  1,2, ...  q 
w  =  Avk 

Hitk  =  v?w  i  = 
w  =  w-HitkVi  i  =  l,...,k 

vk+i  =  WIN  I 

Hk+i,k  =  INI 

end 

Vq  =  [Vl  V2  ...  Vq] 


The  Arnoldi  algorithm  applied  to  the  matrix  pair  A ,  b  for  q  steps  generates  q+1  orthonormal 
vectors  spanning  the  subspace  /C{A,  b}  as  the  q  columns  of  the  matrix  Vq  and  the  vector  vq+ i. 
As  a  product  of  the  orthogonalization  procedure,  the  method  produces  a^xg  upper  Hessenberg 
matrix  Hq  and  scalar  hq+ \,q  which  satisfy, 

AVq  —  VqHq  + 


(9.5) 


where  eq  is  the  9th  unit  vector.  It  can  be  shown  that  a  matrix  rational  function  Gf(s)  = 
\\b\\2cTVq(I  —  sHq)~1e  1  is  a  reduced  order  model  of  the  original  system,  Eq.  9.3,  which  matches 
its  first  q  —  2  moments  [65].  Generally  an  accurate  model  can  be  obtained  with  q  much  smaller 
than  the  system  dimension.  The  only  operations  with  the  matrix  A  which  are  needed  are  matrix- 
vector  product  operations,  making  the  method  attractive  for  sparse  systems.  The  difficulty  with 
applying  these  algorithms  to  distributed  systems  such  as  those  described  by  rPEEC  is  that  these 
systems  have  a  frequency-dependent  A  matrix,  A(s) .  In  order  to  be  able  to  apply  the  Arnoldi 
method  to  distributed  systems  such  as  networks  with  transmission  lines,  or  rPEEC  models,  we 
must  construct  a  first-order  lumped  system  description. 


9.2  Reduction  of  distributed  systems 

The  fullwave  rPEEC  models  are  not  lumped  linear  models  as  they  contain  delay  factors  of 
the  form  e~sT.  Consider  the  Laplace  domain  representation  of  a  general  distributed  system 


A(s)a:  =  b 

(9.6) 

II 

(9.7) 

where  we  have  allowed  the  system  matrix  A(s)  to  have  arbitrary  frequency  dependencies.  In 
general,  Eq.  9.6  may  describe  an  infinite-order  linear  system.  That  is,  the  Taylor  expansion  of 
the  matrix  operator  A(s)  may  contain  infinitely  many  non-zero  terms.  To  utilize  the  Arnoldi  al¬ 
gorithm  for  model  reduction,  we  must  first  convert  this  infinite-order,  finite-dimensional  system 
into  a  first-order,  infinite  dimensional  system. 

One  way  to  obtain  a  first-order  representation  of  Eq.  9.6  is  to  expand  A(s)  in  a  Taylor  series 

2  3 

A(s)  =  A0  +  sA1  +  ^A2+S-A2  +  ...  (9.8) 

The  equivalence  above  is  satisfied  for  all  values  of  s  if  the  matrix  A(s)  is  entire  in  the  complex 
plane;  that  is,  if  it  contains  no  entries  with  finite  singularities.  In  that  case  the  radius  of 
convergence  of  the  Taylor  series  is  infinite.  This  is  true  in  the  case  of,  for  example,  rPEEC 
circuits  where  A(s)  contains  only  algebraic  entries  and  delay  factors  (exponentials  in  the  Laplace 
domain) . 

Substituting  Eq.  9.8  into  Eq.  9.6  and  multiplying  by  Aq 1,  the  system  becomes 


2  3 

I  +  sA\  +  +  "*"37  + 


x  =  A0 1b 


(9.9) 


where  Ak  =  A0  lAk  and  I  is  the  unity  matrix.  We  recursively  define  new  variable  vectors  as 
follows, 

s  s  s 

x0  =  x,  x1  =  -x0,  X2  =  -Xl ,  •••,  Xk  =  Y+lXk’ 


« 


« 


« 


« 
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The  system  of  Eq.  9.6  becomes 


r 

l 

1 

h-* 

1 

to 

1 

co 

1 

_ i 

> 

Xo 

'  A^b  ' 

1/2 

XI 

0 

<I-(s) 

1/3 

> 

= 

0 

1/4 

^3 

0 

< 

m 

4 

: 

• 

y(s)  =  cT  0  0  0... 


xi(s) 

X2  (s) 
Z3(s) 
®4(s) 


(9.10) 


(9.11) 


or 

(I  -  s'A)x  =  b  (9.12) 

y(s)  =  cTx(s)  (9.13) 

This  is  a  first  order  system  of  infinite  dimension  which  allows  us  to  apply  a  Krylov-subspace 
based  model-reduction  algorithm,  just  as  for  the  system  of  Eq.  9.6.  More  importantly,  for  finite 
order,  the  model-reduction  algorithm  can  be  executed  in  a  finite  number  of  steps  since  the  start¬ 
ing  vector,  &,  is  bottom  sparse  (only  the  first  n  entries  are  non-zero).  When  an  Arnoldi  process 
is  used  to  generate  a  upper-Hessenberg  representation  of  the  matrix  A,  from  the  structure  of 
A ,  each  Arnoldi  vector  Vk  will  have  kn  non-zero  entries  in  it.  So,  at  order  k ,  the  number  of 
original  n-size  matrix  products  required  to  obtain  the  next  Arnoldi  vector  v^+i  will  be  k.  Thus 
total  storage  and  computational  costs  will  be  0(q2n)  for  an  order-g  model. 

Now  we  apply  this  algorithm  to  the  sparse  rPEEC  factorization  (Eq.  8.17).  Assuming  a 
Taylor  expansion  about  a  complex  frequency  so,  the  matrices  in  the  Taylor  expansion  of  the 
system  description  are 


Ak  =  Dk(so)  +  WTUk(So)W  (9.14) 

where  Dk  and  Uk  are  the  Taylor-expansion  matrices  of  D(s)  and  U(s).  To  implement  the 
model  reduction  algorithm,  it  is  necessary  to  compute  matrix-vector  products  with  the  ma¬ 
trices  Ak,k  >  0,  and  solve  linear  systems  of  the  form  Aox  =  b  for  arbitrary  b.  Step  k  of 
the  Arnoldi  model-order  reduction  algorithm  requires  k  products  with  Ai, . . . ,  Ak  and  one  lin¬ 
ear  system  solution  with  Aq.  The  Ak  products  are  straightforward.  Since  it  is  possible  to 
compute  matrix- vector  products  with  Aq,  the  linear  systems  can  be  solved  using  a  Krylov- 
subspace  iterative  algorithm  such  as  GMRES[11].  Several  techniques,  such  as  preconditioning 
and  Krylov-subspace  recycling[67,  68,  69],  can  be  used  to  accelerate  the  convergence  of  the 
iterative  algorithm,  but  we  will  not  discuss  them  in  detail  here. 


One  principle  difficulty  with  the  proposed  algorithm  is  that  as  the  matrices  D/.,  Uk  are 
needed  for  many  k,  either  they  must  all  be  precomputed  and  stored,  at  a  great  increase  in 
memory  cost,  or  they  must  be  computed  in  runtime.  Generally  the  memory  requirements  are 
limiting,  so  at  each  order  q,  the  information  needed  to  compute  products  with  Dk,Uk,k  <  q 
must  be  computed.  In  particular,  a  precorrection  step  is  needed  for  each  D*.  This  greatly 
increases  the  effective  cost  of  a  matrix-vector  product  relative  to  the  single-frequency  case. 
That  is,  products  with  the  q  terms  At,  k  =  1 . . .  q  cost  much  more  than  k  times  the  cost  of  a 
single  product  with  Ao  (assuming  as  is  the  usual  case  that  the  information  needed  to  represent 
Ao  has  been  precomputed).  On  the  other  hand,  if  only  a  few  orders  in  the  Taylor  expansion 
are  needed,  for  example  at  low  frequencies  (and  regardless  of  model  order),  each  of  the  £>&,  Uk 
matrices  can  be  precomputed,  and  the  resulting  combined  algorithm  is  quite  efficient. 


9.3  Multipoint  rational  approximants 

Our  approach  is  based  on  a  Taylor  series  expansion  of  the  exponential  function,  and  for 
large  arguments,  corresponding  to  frequencies  far  from  the  expansion  point,  it  is  not  possible 
to  accurately  sum  an  arbitrary  number  of  terms  in  this  series.  Thus,  unlike  Krylov  methods 
for  lumped  systems,  we  cannot  obtain  accurate  models  of  arbitrarily  high  order  from  a  single 
expansion  point.  This  is  not  a  concern,  however,  as  it  is  more  efficient  to  obtain  the  rational 
approximant  from  moderate-order  expansions  about  multiple  expansion  points,  rather  than  a 
single  high  order  expansion  [70],  particularly  since  the  cost  of  the  model  generation  grows  as 
q2.  Our  approach  is  similar  to  complex  frequency  hopping[31]  where  several  explicit  moment¬ 
matching  expansion  points  (hops)  are  used  to  generate  a  transfer  function  over  a  broad  frequency 
range.  Once  it  is  determined  that  an  approximation  has  reached  its  limit  in  accuracy  and/or 
CPU  usage  then  another  expansion  point  is  chosen.  From  CPU  considerations  hops  of  order  20- 
40  seem  to  be  optimal.  Several  options  exist  to  utilize  the  information  from  multiple  expansion 
points.  A  CFH-style  algorithm,  based  on  analyzing  converged  poles,  can  be  used  to  construct 
a  single  reduced-order  model  by  extracting  the  poles  and  residues  at  each  expansion  point  and 
rejecting  inaccurate  ones.  Pole  convergence  may  be  determined  by  identifying  common  poles 
in  neighboring  expansions,  as  in  CFH,  or  by  using  estimates  based  on  residuals  of  Ritz  pairs, 
which  are  available  directly  from  the  Arnoldi  process[28,  65].  Alternatively,  the  two  expansion 
points  can  be  deemed  to  be  accurate  in  the  range  between  them  when  their  frequency  response 
matches  at  some  intermediate  point,  and  the  full  frequency  response  obtained  via  piecewise- 
rational  function  interpolation  using  the  frequency  responses  from  the  various  expansion  points 
as  interpolation  functions. 


GHz 


Figure  9-1:  Frequency  response  of  the  two-strip  example.  Left  (a):  Amplitude  of  driven  cur¬ 
rent  vs.  excitation  frequency,  100  MHz  to  3  GHz.  Solid  lines  show  directly  computed  response 
and  Arnoldi-based  approximation  of  dense  model.  Dashed  line  shows  AWE  approximation  of 
dense  model.  Y  shows  response  at  selected  frequency  points  of  sparse  model,  computed  using 
the  iterative  algorithm  GMRES.  Right  (b):  maximum  amplitude  of  electric  field  at  3  meters 
from  structure.  Solid  line  shows  reduced-order  model  of  dense  system.  Dashed  line  shows 
reduced-order  model  of  approximate  sparse  system,  Y  shows  full  dense  model. 


9.4  Computational  Examples 


We  now  consider  two  representative  examples  which  illustrate  various  aspects  of  the  model- 
order-reduction  algorithm  as  applied  to  the  sparse  rPEEC  representation.  The  first  example  is 
a  simple  configuration  of  two  strip  conductors,  intended  to  illustrate  the  capabilities  of  the  new 
model-order  reduction  algorithm.  Two  parallel  30cm  long  strips,  5cm  apart  and  1cm  wide,  are 
driven  at  one  end  by  a  voltage  source  with  50  ohm  internal  impedance,  and  are  terminated  at 
the  other  end  by  a  10  ohm  resistive  load.  Figure  9-1  shows  the  magnitude  of  the  current  driven 
through  the  load  by  a  unit  voltage  source,  as  a  function  of  frequency. 

We  have  calculated  the  response  using  explicit  single  point  moment-matching,  as  well  as 
the  Arnoldi-based  model-reduction  algorithm  applied  to  the  dense  rPEEC  model,  by  solving 
iteratively  for  the  frequency  response  using  the  sparse  rPEEC  model,  and  by  performing  matrix 
factorization  of  the  dense  rPEEC  model.  The  best  AWE  approximant  it  was  possible  to  obtain 
used  20  moments.  Moment-matching  can  resolve  the  first  resonance  and  detect  the  presence  of 
the  second,  but  cannot  match  the  actual  frequency  response  past  about  0.8  GHz.  On  the  other 
hand,  the  Arnoldi  based  approach  was  able  to  accurately  match  the  frequency  response  over 
a  3GHz  range  using  an  85th  order  model.  We  stress  that  this  is  actually  an  example  where 
moment-matching  performs  relatively  well.  Examples  are  easily  constructed [29]  for  which  a 
single  point  moment  expansion  breaks  down  after  only  a  few  moments,  in  which  case  virtually 
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Figure  9-2:  Left  :  a  large  interconnect  structure.  Vertical  axis  is  not  to  scale.  Right:  Current 
driven  by  voltage  source  through  resistive  termination  to  ground  plane.  Dotted  line:  order- 
20  Arnoldi  model  with  s0/(27t)  =  0.5  +  2 i  GHz.  Dash  line:  order-15  Arnoldi  model  with 
so/(27t)  =  0.5  GHz.  Solid  line:  response  obtained  without  model  reduction,  as  well  as  the 
piecewise-rational  approximation  obtained  from  combining  models  at  separate  expansion  points; 
the  results  overlap.  Note  that  the  expansion  at  0.5  +  2i  Ghz  gives  a  very  good  approximation 
up  to  3  GHz  but  is  not  good  near  s  =  0  whearas  the  opposite  is  true  of  the  expansion  at  0.5 
GHz. 

no  information  about  the  frequency  response  can  be  obtained.  CFH,  using  multiple  expansions, 
will  require  too  many  expansions  due  to  the  limitation,  once  again,  of  a  single-point  explicit 
moment-matching. 

Figure  9-1  (b)  demonstrates  that  the  Arnoldi-based  model-order  reduction  applied  to  the 
sparse  model  generated  by  the  precorrected-FFT  algorithm  generates  a  rational  approximant 
for  all  the  currents  that  is  sufficently  accurate  for  radiated  field  calculations.  The  maximum 
amplitude  of  the  radiated  field  was  calculated  at  3m  from  the  parallel  strips.  The  reduced-order 
model  of  the  dense  system  achieves  essentially  an  exact  match  to  the  actual  field  amplitude. 
The  reduced-order  model  of  the  precorrected-FFT  method  is  only  slightly  less  accurate. 

The  second  example  is  a  large  multiconductor  interconnect  structure,  which  illustrates  the 
capability  of  the  combined  multilevel/model-order-reduction  algorithm  to  analyze  very  large, 
complex  problems.  Several  signal  traces,  200  /xm  wide  run  3mm  over  a  10cm  wide  ground  plane, 
as  shown  in  Figure  9-2.  A  narrow  mesh  of  traces  is  located  over  the  signal  paths,  6mm  above 
the  ground  plane. 

An  automatic  discretization  routine  requires  that  the  ground  plane  contain  an  “image” 
discretization  of  the  narrow  traces,  resulting  in  a  final  discretization  with  a  large  number  of 
unknowns.  The  full  model  contains  5270  capacitive  nodes  and  9161  inductor  currents.  The 
final  matrix  which  must  be  factored  has  9129  unknowns,  containing  8.3  x  107  entries.  The  code 


120 


x  10 


X 

X 

X 

X 

X 

X 

X  x 

X 

X 

x  v  x>^ 

X  X*D 

X  X  % 

*x  * 

X 

11 

s,  x  xX 

X 

X  ^ 
> 

X 

X 

X 

X 

X 

X 

X 

_ i _ 1 _ i _ 1 

8 

6 

4 

2 

0 

-2 

-4 

-6 

-8 


-5 


x  10 


9 

~  x  10 

3| - 


- y-1 - 

x  : 

X 

X  : 

- X . 

X 

*  *: 

X  *  O 

. X" 

xx 

X  # 

X  # 

xx 

. x . . . 

X  * 

X  *> 

X 

X 

X  *  o 

x 

i _ i 

x  : 

X 

x  : 

i _ X_i _ 1 

1 _ 1 

l - 

3 1 _ i _ i _ S£_i _ l _ i _ I 

-3-2-1  0  1  2  3 


Figure  9-3:  (x)  Poles  calculated  by  Arnoldi  method.  (*)  poles  calculated  by  moment¬ 
matching.  The  right  hand  plot  is  an  expanded  view  of  the  poles  near  the  origin. 


using  the  full  dense  model  would  need  about  3  gigabyte  of  storage,  with  1.3  gigabytes  needed 
just  to  store  and  factor  the  matrix.  On  a  machine  capable  of  100MFLOPS  sustained,  about 
5.6  hours  would  be  needed  for  a  single  LU-factorization. 

#  In  contrast,  the  sparse  model  generated  by  the  precorrected-FFT  algorithm  has  only  about 

6.2  x  106  non-zero  entries,  corresponding  to  a  factor  of  13  reduction  in  model  size.  The  code 
requires  600  megabytes  of  storage  (a  considerable  portion  of  which  is  for  storage  of  vectors 
needed  for  the  recycled  Krylov-subspace  iterative  matrix  solution  technique),  and  executes  on 
q  a  machine  with  512  megabytes  of  physical  memory  without  swap  activity  in  the  main  solution 

phase.  Figure  9-2  shows  the  response  of  the  computed  reduced-order  model.  It  was  possible  to 
compute  the  order  20  model,  which  nearly  spans  the  entire  frequency  range,  in  about  12  hours 
on  an  IBM  RS6000/560  architecture. 

9.5  Obtaining  time-domain  models 


For  many  applications,  it  is  sufficient  to  obtain  the  frequency-reponse  of  the  system.  Others, 
however,  require  time-domain  information.  At  worst,  once  the  frequency-domain  information 
®  is  available,  a  time-domain  model  can  be  quickly  obtained  by  using  the  methods  of  [71],  for 

example.  A  more  elegant  approach,  however,  would  be  to  use  the  information  available  from 
the  Arnoldi  procedure  to  construct  the  time  domain  response,  as  generally  once  a  rational 
approximant  to  the  frequency  response  is  available,  the  time-domain  response  is  easily  computed 
#  (see  Eq.  9.15  below). 

The  first  difficulty  in  obtaining  time-domain  models  is  that  the  rational  approximation  to  the 
frequency  response  may  contain  poles  that  lie  in  the  right-half  plane.  The  corresponding  time- 
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domain  models  are  thus  not  time-stable.  This  is  a  well-known  problem  for  the  methods  that 
create  Pade  approximants  by  moment  matching.  The  problem  appears  to  be  more  pronounced 
for  the  Arnoldi  approximants  deriving  from  the  series  expansion  of  delay  potentials.  Figure  9-3 
shows  the  calculated  poles  of  the  85th  order  model  of  the  two  30cm  strips  previously  considered 
in  Figure  9-1.  Most  of  the  poles  of  the  Pade  approximant  lie  in  the  left  half  plane;  only  a 
few  stray  poles  are  in  the  right  half  plane.  The  Arnoldi  procedure  in  contrast  generates  poles 
in  a  cardoid-like  pattern  which  extends  well  into  the  right-half-plane.  What  appears  to  be 
occurring  is  that  in  order  to  approximate  the  essential  singularity  at  infinity  of  the  exponential 
function,  as  the  order  of  the  approximation  is  increased,  more  and  more  poles  are  placed  on 
a  circular  or  oblong  arc  encompassing  the  origin,  whose  radius  increases  with  the  order  of  the 
approximation.  In  general  we  have  observed  that  approximants  generated  by  a  non-symmetric 
Lanczos  procedure  (equivalent  to  a  Pade  approximation)  generate  fewer  unstable  poles  than 
the  Arnoldi  procedure,  for  the  type  of  series  expansion  of  delay  potentials  considered  in  this 
paper. 

The  ideal  resolution  to  this  situation  would  be  to  develop  a  multi-point  Krylov  approxi¬ 
mant  which  generated  guaranteed-stable  models.  It  is  believed  that  the  source  of  the  unstable 
approximants  is  the  expansion  of  the  exponential  delay  factor  in  power  series,  so  development 
of  such  an  algorithm  may  require  an  alternative  realization  of  the  delay  terms.  A  workable,  if 
not  ideal  approach,  however,  is  available  which  is  similar  to  the  CFH  [31]  algorithm. 

First,  for  each  pole  in  the  frequency  domain  of  interest,  the  residual  of  the  eigenvalue/eigenvector 
pairs  generated  by  the  Arnoldi  method  is  examined.  If  the  residual  is  sufficiently  small,  and 
the  pole  is  in  the  left  half  plane,  it  is  retained  for  the  final  time-domain  model.  This  procedure 
gives  a  set  of  poles  from  which  a  rational  function  can  be  constructed.  It  is  then  required  that 
the  frequency  response  of  the  final  model  and  the  actual  frequency  response  match  at  a  number 
of  discrete  points  on  the  imaginary  axis  equal  to  the  number  of  poles.  This  gives  a  linear  system 
of  equations  that  can  be  solved  to  obtain  the  residuals.  This  procedure  at  the  same  time  gives 
the  multi-point  approximant,  as  the  frequency  data  needed  to  calculate  the  final  residuals  can 
be  taken  from  the  nearest,  and  most  accurate,  frequency  expansion  points.  Adequate  results 
are  obtained  by  choosing  the  frequency  collocation  points  to  be  equal  to  the  imaginary  parts 
of  the  retained  poles.  Figure  9-4  shows  the  results  of  such  a  procedure. 

A  set  of  30cm  long  strips  was  considered,  with  width  1mm  and  separation  1mm.  Because  of 
the  smaller  transverse  dimensions,  this  model  has  a  strong  frequency  response  up  to  much  higher 
frequencies  than  the  previous  strip  model,  which  had  1cm  transverse  dimensions.  Figures  9-4 
(c)-(f)  show  the  amplitude  of  the  frequency  response  calculated  at  several  different  expansion 
points.  Each  response  shown  is  derived  from  an  order-50  model.  The  four  models  are  patched 
together  to  obtain  the  final  frequency  response,  shown  as  the  solid  lines  in  Figures  9-4(a)  and 
(b).  After  deleting  the  unstable  poles  and  recalulating  the  residues,  the  final  model  is  obtained, 
whose  frequency  response  is  shown  as  the  (+)  in  9-4(a)  and  (b). 
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Figure  9-4:  Frequency  response  functions  for  30-cm  long  strips  with  10-fi  termination  driven 
by  voltage  source  with  50- SI  internal  impendance.  (a)  Magnitude  of  H(s)  =  I(s)/V(s).  Solid 
line  shows  overlapped  multipoint  frequency  response,  (+)  show  frequency  response  of  final 
model  after  deletion  of  unstable  poles  and  recalculation  of  residues,  (b)  phase  of  H(s),  same 
figure  key  as  for  (a),  (c)  Amplitude  of  H(s),  expansion  point  sq  =  0.5  +  0 i  Ghz  (d)  Amplitude 
of  H (s),  expansion  point  s o  =  0.5  +  2?'  Ghz  (e)  Amplitude  of  II (,s),  expansion  point  sq  =  0.5  +  4? 


Figure  9-5:  Time-domain  step  response  of  30-cm  long  strips. 


Once  the  frequency  response  has  been  constructed  as  the  rational  function  h(s ), 

*w  =  E—  (9-15) 

i  S~Pi 

it  is  an  easy  matter  to  obtain  the  time-domain  response 

hit)  =  J2ne^.  (9.16) 

i 

If  the  input  x(t )  has  the  form  of  a  rising  exponential, 

x(t)  =  [l  -  e~at]  u(t )  (9.17) 

where  u(t )  is  the  unit  step  function,  the  output  y(t)  =  x(t)  *  h(t)  is  likewise  easy  to  construct. 
Such  a  form  for  the  input  allows  us  to  examine  the  effects  of  finite-risetime  signals  on  the 
accuracy  of  the  time-domain  model.  (For  such  an  input,  the  risetime  is  about  2.3 /a).  The 
larger  the  risetime,  the  less  accurate  the  frequency  domain  model  needs  to  be  to  accurately 
calculate  the  output  current,  y(t). 

Figure  9-5  shows  the  time-domain  response  of  the  narrow-width  strip  model  to  a  unit  step 
input  with  zero  risetime.  This  figure  demonstrates  several  aspects  of  the  combined  algorithm. 

The  model  is  able  to  give  a  good  qualitative  representation  of  the  dynamic  response  of  a 
structure  where  the  delay  effects  are  very  important.  At  least  four  reflections,  representing 
nine  one-way  traversals  of  the  strip  by  the  wavefront,  are  represented  by  this  single  rational 
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approximation.  However,  there  are  pronounced  spurious  oscillations  in  the  reponse,  especially 
for  small  times.  In  part  this  is  due  to  the  fact  that  we  are  examining  higher  frequencies  than 
those  for  which  the  model  is  accurate,  as  this  model  is  only  accurate  up  to  around  9  Ghz,  but 
the  frequency  response  is  significant  past  that  point. 

If,  however,  the  input  signal  has  finite  risetime,  the  truncation  of  higher-order  frequency 
effects  will  not  be  visible  in  the  final  output  current.  The  effect  of  varying  the  risetime  is  demon¬ 
strated  in  Figure  9-6.  As  the  risetime  is  increased,  the  amplitude  of  the  spurious  oscillations 
becomes  less  pronounced.  From  these  results,  we  conclude  that  certainly  the  model  is  likely  to 
be  accurate  for  frequencies  up  to  8  Ghz  or  so.  Eventually  the  risetime  becomes  long  enough 
that  the  output  current  simply  follows  the  input  -  at  this  point  the  structure  is  behaving  as  a 
60-fl  resistor. 


9.6  Difficulties  of  the  present  approach 

The  integration  of  the  model  reduction  procedure  and  the  precorrected-FFT  method  exposes 
several  shortcomings  of  both. 

The  application  of  the  model  reduction  algorithm  to  the  precorrected-FFT  generated  sparse 
model  is  easier  to  describe  and  implement  if  the  projection  and  interpolation  operators  are 
frequency-independent,  such  as  the  polynomial-based  operators  used  in  this  chapter.  These 
operators,  are,  however,  not  the  most  accurate  available.  It  is  possible  to  adapt  the  algorithm 
to  use  the  more  accurate  projection  operators  obtained  from  grid-collocation  (for  the  far-field 
collocation  operators,  the  frequency  expansion  points  must  be  taken  away  from  the  origin)  at 
a  cost  of  a  more  complicated  algorithm.  It  is  unknown  what  effect  such  extension  would  have 
on  the  required  computational  resources. 

However,  one  advantage  of  using  polynomial  interpolation  is  that  it  is  possible  to  eliminate 
the  precorrection  step  by  modifying  the  kernels  used  for  the  grid-  and  short-range  calculations [6]. 
The  modification  involves  using  an  analytically  “smoothed”  kernel  for  the  grid-potential  calcu¬ 
lation.  The  difference  between  the  “smooth”  kernel  and  the  actual  kernel  can  be  incorporated 
into  the  direct  potential  calculation  without  reference  to  the  grid  since,  for  polynomial  projec¬ 
tion/interpolation  operators,  analytic  forms  are  available  for  the  “smooth”  kernel.  Such  forms 
do  not  appear  to  be  available  for  the  grid-collocation  operators. 

When  local  corrections  must  be  performed  for  every  matrix  vector  product,  this  optimization 
should  result  in  a  significant  savings  in  CPU  time.  If  there  are  n  panels  in  the  system  the  m 
nearest-neighbors  are  included  in  the  grid  calculation,  the  precorrection  cost  is  around  order 
p3m3n.  Even  for  the  lowest  accuracy  likely  to  be  useful  in  practical  applications,  the  constant 
factor  can  easily  be  several  thousand.  For  example,  consider  p  =  3  with  only  nearest-neighbors 
considered  in  the  direct  calculation.  For  each  cell,  to  perform  the  precorrection,  the  potential 
of  the  33  grid  charges  must  be  calculated  at  73  grid  points.  Done  directly,  this  requires  nearly 


10,000  complex  multiplies.  The  operation  could  also  be  done  using  a  3D  FFT,  at  a  similar 
cost.  So,  the  number  of  floating-point  double-precision  multiplications  needed  per  panel  to 
perform  the  precorrection  is  around  40,000  divided  by  the  average  number  of  panels  per  cell. 
Empirically  we  have  observed  that  the  precorrection  step  represents  by  far  the  dominant  cost  in 
the  matrix- vector  product  if  it  must  be  performed  for  each  matrix- vector  multiply.  This  partly 
accounts  for  the  marginal  speedups  obtained,  relative  to  direct  methods,  for  the  full-wave  EM 
code,  when  compared  to  the  speedups  possible  for  the  scalar  Helmholtz  and  Laplace  solvers 
described  in  previous  chapters. 

The  other  reason  for  the  relatively  disappointing  results  have  to  do  with  the  nature  of  the 
iterative  solver.  When  an  iterative  solver  is  used,  at  either  each  frequency  point  or  each  order 
in  the  model  reduction  procedure,  a  new  iterative  solve  must  be  commenced.  It  is  difficult 
for  the  iterative  solver  to  effectively  reuse  information  from  previous  solves.  Reuse  of  the 
Krylov  subspace  is  possible  in  the  model  reduction  procedure,  and  has  been  observed  to  provide 
moderate  performance  improvements,  but  generally  no  reuse  of  information  is  possible  for 
solves  at  differing  frequency  points.  In  contrast,  when  direct  factorization  is  used  in  the  model 
reduction  procedure,  the  cost  at  each  order  is  small  once  the  matrix  has  been  factored  for 
the  first  order  approximation.  Similarly,  when  multiple  solves  at  differing  frequency  points  are 
required,  the  factored  matrix  provides  a  cheap  and  effective  preconditioner  for  an  iterative  solve 
if  the  frequency  points  of  the  solution  point  and  the  frequency  points  for  the  factored  matrix 
are  reasonably  close. 

Additionally,  iterative  solution  procedures  only  have  substantial  advantages  over  direct 
methods  when  applied  to  well-conditioned  matrices,  or,  equivalently,  when  effective  precon¬ 
ditioners  are  available.  The  matrices  generated  by  the  particular  rPEEC  implementation  used 
are  empirically  very  ill-conditioned.  Often  several  hundred  GMRES  iterations  are  required  to 
converge  to  moderate  tolerances.  Increasing  the  strain  on  the  iterative  solver,  the  iterative  solver 
must  be  converged  to  a  relatively  high  tolerance  in  order  for  the  model  reduction  procedure  to 
attain  high  orders  of  approximation. 

The  cost  of  the  Arnoldi-based  model  reduction  procedure  can  also  become  quite  large  at 
high  order.  The  memory  needed  to  store  the  Arnoldi  vectors,  and  the  cost  required  for  the 
back-orthogonalization,  grows  rapidly  with  order  and  generally  limits  the  maximum  order  that 
can  be  achieved  for  large  problems  to  100  or  so  before  the  memory  and  time  needed  by  the 
Arnoldi  procedure  exceeds  that  required  by  the  precorrected-FFT  portion  of  the  code. 

Finally,  this  chapter  has  dealt  only  with  single-input,  single-output  models.  Most  practical 
problems  of  concern,  however,  involve  multiport  models.  The  various  Krylov-subspace  methods 
extend  quite  naturally  to  the  multi-input,  multi-output  (MIMO)  case.  For  a  single  frequency 
expansion  point,  extending  the  algorithm  presented  here  to  MIMO  problems  is  fairly  straight¬ 
forward.  When  multiple  frequency  points  are  required  for  an  accurate  response,  however,  the 
situation  becomes  more  difficult.  Extensions  of  the  multipoint  approximation  procedure  dis- 


cussed  here  are  possible,  but  awkward.  A  true  multipoint  Krylov-subspace  algorithm,  with 
stable  approximations  to  the  delays,  would  be  the  ideal  solution. 


Conclusion 


In  this  thesis,  a  new  approach  was  presented  to  reducing  the  CPU  time  and  memory  required 
to  perform  electromagnetic  analyses  of  complicated  3-D  structures.  The  primary  computational 
tool  developed  was  the  grid-based,  “precorrected-FFT”  algorithm. 

In  Chapter  2  the  problem  of  approximating  singular  potential  functions  was  discussed,  and 
a  grid-collocation  method  described  for  representing  charge  potentials  by  the  potential  of  a 
discrete  set  of  point  charges.  These  results  motivated  the  development  and  analysis  of  the 
precorrected-FFT  approach  in  Chapters  3  and  4,  where  indications  of  the  proposed  algorithm’s 
viability  were  given.  The  final  approach,  using  the  grid-collocation  operators,  was  shown  to 
be  well  suited  to  applications  which  require  solutions  of  integral  equations  with  fixed  param¬ 
eters,  such  as  problems  associated  with  electrostatic  analysis  of  integrated  circuit  packaging, 
on-chip  interconnect,  and  micro-electro-mechanical  systems,  as  discussed  in  Chapters  5  and  6. 
Additionally,  parameter-dependent  problems  such  as  the  scalar  Helmholtz  problems  of  Chapter 
7,  which  could  have  practical  application  in  computational  acoustics,  or  the  full-wave  electro¬ 
magnetic  problems  of  Chapter  8,  can  be  efficiently  treated  when  information  at  a  limited  set  of 
parameter  values  (e.g.,  single  frequency  solutions)  is  desired.  For  such  problems,  for  moderately 
inhomogeneous  geometries  and  moderate  accuracies,  the  precorrected-FFT  approach  was  more 
efficient  than  the  competing  fast  multipole  algorithm,  in  many  cases  by  a  large  factor. 

For  problems  that  require  solutions  over  a  wide  range  of  frequencies,  additional  algorithmic 
innovations  were  necessary.  In  order  to  effectively  utilize  the  sparse  representation  generated 
by  the  precorrected-FFT  method  to  efficiently  compute  the  frequency  response,  it  was  neces¬ 
sary  to  develop  a  numerically  stable  algorithm  for  model  order  reduction  of  systems  with  delay 
elements.  This  model-order  reduction  technique  was  applied  directly  to  the  sparse  model  gener¬ 
ated  by  the  precorrected-FFT  algorithm.  The  overall  algorithm  has  substantial  qualitative  and 
quantitative  advantages  over  standard  full-wave  electromagnetic  analysis  techniques.  Qualita¬ 
tively,  a  continuous  analytic  closed-form  (pole-residue)  expression  for  the  frequency  response  is 
obtained,  instead  of  samples  at  discrete  points,  and  a  pole-residue  model  may  be  obtained  that 


can  be  directly  incorporated  into  time-domain  simulation.  Quantitatively,  for  even  moderate 
sized  problems,  substantially  less  memory  is  required  by  the  new  algorithm,  and  the  frequency 
response  can  be  obtained  over  a  range  of  frequencies  in  the  same  time  as  the  response  is  obtained 
at  a  few  discrete  points  by  standard  techniques. 

In  Chapters  8-9  it  was  demonstrated  that  the  combined  approach  of  incorporating  a  rela¬ 
tively  simple  multilevel  integral  transform  with  the  model  reduction  procedure  can  dramatically 
reduce  the  size  of  interconnect  models  generated  by  complicated  three-dimensional  intercon¬ 
nect  structures.  Compared  to  commonly-used  techniques,  on  a  problem  with  10,000  unknowns, 
the  resulting  algorithm  has  a  fivefold  storage  advantage,  and  computes  a  complete,  continuous 
frequency  response  in  the  same  time  as  standard  techniques  require  for  solutions  at  a  few  dis¬ 
crete  frequency  points.  More  importantly,  the  size  of  the  sparse  model  and  cost  of  generating 
the  system  response  from  it  should  grow  considerably  less  rapidly  with  problem  size  than  for 
standard  techniques. 

In  light  of  these  results  we  can  draw  some  conclusions  about  the  relative  advantages  of  the 
grid-based  algorithms  proposed  herein.  A  major  advantage  of  the  precorrected-FFT  method 
is  that  it  is  based  on  the  FFT  and  local  interpolation  operators,  rather  than  on  spherical- 
harmonics  based  shifting  operators  as  in  the  fast  multipole  method.  Thus,  a  range  of  kernels 
can  be  treated  in  the  method  while  still  preserving  the  high  order  of  accuracy  of  multipole-based 
representations.  This  was  the  major  motivation  for  the  development  of  the  algorithm. 

The  most  salient  drawback  of  the  algorithm  is  its  poor  performance  on  very  inhomogeneous 
problems.  The  use  of  the  FFT  to  evaluate  the  grid  potentials  is  inefficient  in  this  case,  since  most 
of  the  data  in  the  transform  is  zero.  A  similar  problem  occurs  for  in  the  layered  media  problem, 
when  the  panels  are  distributed  among  many  layers  of  greatly  varying  width.  In  principle,  algo¬ 
rithms  can  be  developed  to  evaluate  the  grid  potentials  in  the  strongly  inhomogeneous  case  by 
applying  the  algorithm  of  Chapter  3  recursively  to  obtain  a  multigrid-like  algorithm.  Such  an 
algorithm  would  ideally  inherit  the  efficiency  of  the  grid-based  representation  discussed  in  this 
paper,  while  at  the  same  time  achieving  the  near  0(n)  complexity  of  the  fast  multipole  meth¬ 
ods.  The  extension  is  straightforward  for  smooth  kernels  and  one-dimensional  implementations 
experiments  have  been  encouraging,  although  there  is  some  concern  that  computation  of  local 
interactions  of  the  first  grid  level  could  introduce  inefficiencies  similar  to  those  encountered  in 
Chapter  9  due  to  local  corrections.  For  oscillatory  kernels  a  more  cumbersome  approach,  such 
as  proposed  in  [6],  must  be  followed. 

However,  what  our  investigations  reveal  is  that,  in  the  larger  picture,  the  issue  of  inho¬ 
mogeneity,  while  jarring  to  the  more  mathematically  inclined  thinker,  is,  from  an  engineering 
perspective,  actually  of  secondary  importance.  The  results  presented  in  this  thesis  lead  us  to 
believe  that  the  precorrected-FFT  method  is  adequate  for  most  geometries  of  interest.  What  is 
of  more  immediate  importance  is  to  be  able  to  efficiently  integrate  whatever  sparse  matrix  rep¬ 
resentation  is  chosen  with  algorithms  that  can  obtain  useful  models,  for  example  time-domain 
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models  or  frequency  plots,  rather  than  just  simple  single-point  linear  solutions,  which  are  gen¬ 
erally  useful. 

In  this  sense  the  more  difficult  issue  is  how  to  efficiently  obtain  local  corrections.  The 
approach  of  explicitly  subtracting  the  grid  contributions  from  the  nearby  interactions  is  either 
too  computationally  expensive,  or  too  memory  intensive,  to  incorporate  in  the  present  model- 
reduction  procedure. 

Additionally,  at  high  order,  the  local  correction  step  becomes  too  expensive  to  use  even 
when  amortized  over  many  linear  system  solutions,  such  as  in  capacitance  extraction  and  similar 
applications.  This  limits  the  degree  of  accuracy  that  can  be  achieved  by  grid-based  methods. 
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